Novice's Notebook

This is a repository of "novice" articles, written with the intent of driving more traffic to the site, and getting more ad clicks. It's pretty crass, I know, but the information may be very useful. Some of the content is adapted from the diy notes, and other notebooks, which are a bit rougher than these.

Most of these articles are not authoritative, because they're based on what I'm learning, as I'm learning it.

Anti Virus Problem: a Hacked Shell that Won't Run EXE Files

I'm starting to forget this one already, but, recently, I dealt with a virus that hacks the shell and inserts a handler for .EXE file.

You know how, when you double click on .DOC files or .XLS files, the system automatically opens it with the correct application? The way that works is that Windows Explorer has a mapping that describes how to start a .DOC or .XLS file with Word or Excel.

The normal thing for .EXE files is, basically, to do nothing. Just run it as-is.

A virus might alter this, so that, instead of running the EXE file, it runs another program first. (Then, that program will run the EXE file as normal, so that you don't notice something's wrong.) The particular instance I had popped up a window telling me I had a virus, and clicking on their button would sell me a product to remove it.

At least these intruders were somewhat honest.

After finding this problem with RegEdit, I edited out the issue. However, I also made a serious error, and effectively disabled my ability to execute any EXE file.

Getting out of this situation was difficult. What I ended up doing was figuring out a way to execute a file without passing it through the shell.

The solution was to use the Scheduled Tasks to run a .BAT file. That .BAT file contained a line that started RegEdit. I think this task was scheduled to run as Administrator so it would be able to save the registry (but I may be wrong on that part).

The point here is that Scheduled Tasks don't run through the shell. They run in the old Command Line Interface (CLI).

I guess the big question is - why can't I just run the CLI? Well, the CLI is also an EXE file - it's cmd.exe. So when you try to run that through the shell, it ends up being intercepted.

Anti-Pattern: Working With Live Data

I recently lost a chunk of data while I was developing a nice little macro to produce a report. How it was lost, is pretty sad. I had become used to pressing a few keys to clear out my spreadsheet, and I accidentally pressed the keys on a spreadsheet of the live data. Pffft. Data vanished.

I luckily had most of the data in another document, and restored some of the lost data, but, the lost bits were lost. All this was due, not to faulty code, but because I failed to create a development sandbox.

Yes, this was only a macro, but, even for something so simple, it's smart to make a separate place to develop it. This sandbox would have contained a copy of the data.

A sandbox is better than a backup. That's because the sandbox is a minimal subset of what you need to write your program. The real deployment environment is usually a lot more complex. To back up the real environment, so it's safe to develop in there, could be more difficult than you could imagine, and take a long time, too.

I tell everyone "work on a copy of the data, not on the original." Well, "physician, heal thyself," is what I should be told. I needed to work on a copy of the data, and not the original.

Backup Book

How to Backup is a free online mini-book explaining basic ideas about how to backup your network, backup technologies, and backup strategies to keep your systems online, and your data available.

How to Backup is a simple read. It doesn't get too theoretical. It doesn't cover enterprise backup - it's for small businesses and home offices.

You think you know what a backup is, but, do you really?

What is backup?

A backup is a copy of your data.

A backup is an archived copy of your old data.

A backup is a system that can be used to deliver your data, if the primary system fails.

A backup is a system that keeps operating, transparently, even if part of the system fails. It's fault-tolerant.

A backup lets you recover from bad data, quickly.

A backup with frequent incremental backups lets you undo a huge run of bad data.

A backup is a part of a system that costs less than the entire system, that allows nearly all people to keep working in the event of an equipment or data failure.

Links to variations on this booklet

How to Backup the Network at Home

Warning: Part of this document are obsolete for larger disks. A short report on a failure.

I recently had a RAID5 array fail, and learned something about backup: it's not just about the data, but also the recovery time. The recovery time was several hours to bring the system to a state of usability, and several days of work to a state of relative safety, and that required bringing someone in to help with migrating files. Subsequently, we decided to use a two-server setup based around making incremental disk images. Incremental disk images will help make recovery within an hour more feasible.

A RAID array with 700 gigabytes of data takes hours, even days to back up. it takes even longer to restore, because writes take longer.

Exchange recovery proceeds incredibly slowly. A seemingly small 30 gigabyte database took what seemed like half a day to recover.

These two facts can put you in a situation where you have all your data on backups, maybe even multiple backups, but recovering from a failure will take a very long time, forcing the entire business offline for a day, or longer, costing hundreds of dollars per hour (or more) until the system is fixed. That doesn't even include the real value of the work, which (as any leftist would tell you) is greater than the costs of doing the work.

This is an unacceptable situation. Ultimately, it's a good value to spend a few thousand dollars to have a redundant system on-site. Buy enough capacity for two servers, use them both all the time, and when one fails, move all the work onto one server for a few weeks (until the new system is sent and configured).

(If you need to convince your boss to get redundant servers, print this article out.)

RAID nightmares

Large disks are statistically more likey to suffer read errors. Today, all disks ship with errors, and simply map them out. So they need to be continually scanned so the disk can find and fix these errors.

A RAID5 array failure can be difficult to fix. When a disk fails, you can replace it, but if you haven't been running the background consistency check feature for months, it won't be able to rebuild the array successfully: during the recovery you are likely to suffer a read error and then the entire array will go offline.

It's better not to replace the failed disk. Instead, force the entire array back online, and then perform a file-level backup, and restore to a fresh disk. Don't run a consistency check, because that will cause the RAID controller to take the array offline when it hits the error. Doing a file-level backup seems to be more tolerant of errors, or maybe the sectors with errors are just less likely to be read.

Forcing the array online will allow the business to continue operating. Just be aware that the array is damaged and all the data needs to be migrated off of it. It's a zombie disk, undead, and no new data should go on it.

Install a fresh disk, and start migrating all the active data to it, and migrate users onto that disk. This won't take much time, because your active data set is typically small. It'll only take a few hours to do this for most scenarios. It won't be so easy for older server-dependent software, but for newer software with a cleaner separation between client and server, it'll be easy. Set up a frequent backup for this data.

If you haven't started a full backup, do so, and all the older files will be covered by this backup.

If the C: drive is on the array, you will need to image the partition and then move it to the new disk. This is tricky (and we called in our consultant to do it). I'm not sure how to do it, but it requires knowledge of the Windows boot sequence, and may require editing the boot.ini file and the registry so it'll try the new partition first, and totally ignore the old partition.

(This isn't any easier than on Linux. The lesson I learned is that being able to manipulate or even recover and create the boot sequence is a must-know skill for sysadmins. It's also hard to learn and practice, requiring spare hardware and whatnot.)

Once the system is on stable new disks, you have to re-unify the active and old data. I used WinMerge, a great file comparision tool, to do this.

For backups, I used NTBackup - it was an old system. NTBackup has flaws where it'll just fail to save some data. It's also very "quiet" about this - you have to read the final report. I used the error report to build a file list that NTBackup could use to perform an additional backup. Usually, this second try would result in all the files being saved.

Restoring data onto Server 2012 and Server 2008 R2 was weird, because the new OSes don't use NTBackup. You need to dig around to find tools to restore from NTBackup bkf files. The tools work fine.

The newer backup tools are all centered around disk images. The built-in tools don't do incremental backups, so you need to find a 3rd party solution for those. We're going to use ShadowProtect, which is sold by our consultant. I don't know the price, but the market rate for Windows backup with incremental backups is around $1000.

For an equivalent disk-image-based backup system on Linux, you use either software or hardware RAID (I prefer software) and use LVM volumes and virtual partitions. You use "snapshots" to freeze the disk state, and compare disk states. The differences are copied to another computer with a mirror of the partition (via rsync). The main problem is that system performance with snapshots is worse, so you have to work around that.

Archives and Archiving Files and Documents

Archiving is different from backups. Think about them separately.

An archive is an organizational strategy for data. It's a structure into which data can be stored in a way that makes it easy to retrieve the data in the future.

There are a few different ways to organize information. To use some computer terms: "tables", "time", and "hierarchy".

Tables refers to database tables, where data is organized into records and fields (or rows and columns). A record is a unit of data, like a row in a list. A field is information about the data, or the data itself, like the columns in a row. The useful property of a table is that every row has the same columns, so you can sort and group by columns.


A hierarchy is like a filing system of folders.

Chronological organization is to organize information by time, so you can retrieve the data from a specific time period.

The computer's file system uses all three methods of organization. Each file has common fields, like the modification time, size, and usually a file extension.

The files are stored in a hierarchy, and people typically name the folders uniformly. This uniform naming breaks up the filename into fields, so it's easier to sort through the files.

For more info, see the file naming convention articles below.

The file system generally lacks the ability to add extra fields of data. For example, it would be useful to be able to attach major and minor version numbers to every file. While there are some ways to do this, there isn't a simple way that exposes itself through the user interface, easily.

Consequently, the folder hierarchy is usually used instead of extra fields. It's not a bad or good thing - it's just how we do it. For some examples of this, see the folder organizing articles below.

Good archiving can assist backups by breaking the file system into parts. For example, if the folders are organized by client, you can split up the backups by client. Then, you can direct archives for old clients onto specific media, which might be kept offline for offsite. With very little work, you can cut down the time required to backup adequately -- and that translates into a greater capacity for the entire backup system.

File naming convention with dates

The file naming convention I use starts the name with a date: YYMMDD-file-name.ext

If I'm making revisions, I add initials and revision numbers separated by a dot or a dash: YYMMDD-file-name-x.ext or YYMMDD-file-name.x.ext

Similar conventions are used for folder names.

Though the system adds modification times, I still put the date into my file name, because the system's time and date can be lost. If a file is emailed, the creation date can be lost. Putting the date in the filename helps retain this extra data.

Using the date in the filename also helps when with the naming. Typically, I'm working on things for other organizations or people (for money), so I can name a file with the date and the other party's name. As new files are created, I don't have to invent new file names.

If there are multiple projects, just add the project name. The date assures that there's no need to invent new names all the time.

File naming conventions for routing documents past multiple editors

In a typical office, several people have to read a document - the writers, the editors, a manager, the signatory to the document, and possibly some artists.

In many offices, this is carried out over email. The problem with this technique are multiple, but for the backup administrator, the main problem is that each mailed file consumes space in the mail server's file system. It wastes space and network resources.

It also fails to scale up past small documents. Imagine editing long documents this way - it's not realistic.

The standard solution is to have everyone work on a shared file system.

Some offices use a system of "folders" where a document is edited, and versions are moved from one folder to another -- each folder acting as a kind of inbox and workspace. The folders within a project may be named "source", "edit", "review", "signed". Specific people look at each folder, and work on the contents within.

Some offices use project names, but other use project numbers. Numbers may actually work better than names, because people are generally good at mapping numbers to names, but not as good going the other direction (think about how much easier it is to see a phone number and identify the caller than it is to remember a phone number). Not only that, but, numbers are more precise than words -- people won't mix up "9099" and "9080", but they may mix up "Ford" and "Ford Foundation" and thus create confusion.

Some offices alter the file name of a document as it's modified. For example, you start with a document named "2010-Tribe.doc". As it gets edited, the file accumulated editor initials: "2010-Tribe.a.doc" then "2010-Tribe.aj.doc", and so forth as each person reviews the work.

Because the name changes, the backup software that runs every night will save each revision of the file separately. Similarly, if you use a file syncing software, you can accumulate revisions onto your backup.

File naming conventions for websites

Websites are archives. A website that isn't an archive is one that displays a lot of "404 errors" - file not found.

Perhaps more than other kinds of archives, it's important to plan the archive out for accepting new files for a long period of time. That's because websites get links, or what some call "deep links", which are links on pages past the so-called "home page". (I think it's a stupid distinction - a home page is only for branding and frequent users, and there are few of the latter. Most traffic comes from links and search engines.)

When you rename or move files, you break all the links out there. That's the fabric of the web.

To avoid this problem, you have to break up your system into manageable chunks, and you have to do it from the start.

If you expect to upload new image files every day, you should plan to have a system that can handle 365 files per year, and 3,650 files per decade. A single folder might be sufficient for the first 365 files, but, things get unwieldy at 3,650 files if you have to look at the files and pick them. Even the network will slow down when you get a file listing.

The solution for that is to use dated folders.

If you expect to get few images per year, except during events, when you get hundreds of images. The obvious solution is to create one folder per event.

I like to prepend the year to the event, so you get names that sort by date, like 09picnic. If that's not precise enough: 090815-picnic.


You can use upper and lower case, but at your peril. Windows and Mac are case insensitive, but Unix is case sensitive. That means in Unix, "Car.jpg" is different from "car.jpg", and both are different from "car.JPG".

On Windows and Mac, all three are the same file. The hazard is that you create the three files on Unix, and then copy them to a Windows or Mac, and end up with only one file (or an error).

The convention is to use all lowercase for naming files on Unix.

To avoid problems, rename your Windows files in lowercase if they are destined for the Web.

Separate HTML files from image files?

Most websites have all the images in an images directory, and the HTML files are in other directories, or are in the "root" of the server. (the topmost directory).

This is probably because a single HTML file tends to include more than one image. Thus, as the site grows, moving images into their own directory just makes sense - it's a quick fix to the problem of growth.

Suppose each page includes three images. Then, each new page causes four files to appear on the server. 100 pages later, there are 400 files.

By moving images into a directory, the 100 pages cause only 100 visible new files in the directory.

Offline Email Archiving

If you keep old emails, and some of that information is sensitive, you should archive them offline on a computer that doesn't get connected to the network all the time. While this isn't failsafe, it does prevent intruders from accessing sensitive data on backups. (Need to explore backup security issues.)

Ways to set up offline backups vary. Below is an explanation of how to set up an ad-hoc system on Mozilla Thunderbird.

Set up a new Account of type "Other Accounts...". To do this, go into an existing accounts settings, and below the list of accounts, there's a drop-down to let you "Add Other Account..."

Choose the "unix mailspool" type. This is a file-based email drop. (Unix can deliver email on the local system via files.) Go through the rest of the configuration, and name it something like "offline archive".

Next, go into the "Server Settings" section of this account's settings. It will display the directory where the mail is stored. Click "Browse..." and change this to a directory on an external hard drive. (The hard drive must be connected and powered at this point.)

Next, once established, move your data into this offline archive, using the regular Thunderbird methods of dragging and dropping.

Backup Laptops with a Dock

If you have a laptop that you travel with, consider getting a dock for your office desk. If your laptop isn't dockable (because it's a "home" laptop computer), then, get a universal dock. A universal dock is a dock with a USB connector, and an internal USB hub. (You might call it a glorified USB hub.)

To the hub, attach a USB hard drive or USB flash drive. The USB flash is better, because it uses less power.

Get some "sync" software that synchronizes folders. Some will initiate a sync when the drive is connected. Windows users may use a tool like Allway Sync or FolderClone. Set it up to backup the My Documents folder and perhaps the Desktop as well.

Every time you dock and log in, the software should sync and backup your important documents to the USB flash drive.

See the article backup external hard disk or usb flash drive for more ideas.

Backup Tapes

Backup tapes are a popular backup medium, but recently has become more expensive than disk. It's cheaper to use hard disks for backup.

Backup tapes have some advantages. They are smaller than disks, so you can pack more into a box, and send it to an archival location. Usually, backup tapes are stored in a cool, dry room. They are more durable, in that a shock to the backup tape won't cause failure, whereas disks may have a head-crash.

There are many different types of backup tapes, ranging from the old 9mm, serpentine, the Travan, and the DAT. The main backup tapes out there are 4mm that are used with DAT drives.

Enterprises (meaning businesses with scale and money) still buy backup tapes.

Consumers (meaning everyone else) has moved on to disk-based backups. Backup tapes cost more per megabyte than disks, unbelievably.

I guess there are a lot of enterprises overpaying for their data.

Backup to CD-R and CD-RW

Backup to CD shares a lot of problems that backup to DVD has, with some interesting differences.

The main difference is that CDs ware 1/5th the size of DVDs, so you can't backup as much data. Consequently, the backup is "faster" because the data set is smaller.

So, CDs are basically not good for backing up your system, but, are a great way to make archival snapshots of your work-in-progress.

For example, if you wanted to retain 7 days of your past work, you can purchase 7 CD-RWs, and label them "Monday", "Tuesday", "Wednesday" and so forth. Put them in jewel cases, and then into a CD box.

Each day, either at the start or end of a day, run a backup of your work, and then store it. It won't take more than 10 minutes.

For your effort, you are rewarded with an archive of your most important data, at your fingertips.

Backup to DVD

Creating backups on DVD-R or DVD-RW allows you to store up to 4.8 gigabytes of data (or 2.4 if you use single layder DVDs).

The main advantages:

  • low cost
  • archival, by default
  • widely supported, and readers are common

The main disadvantages:

  • slow write speeds
  • limited capacity
  • data is easily damaged

If you are going to backup to DVD, get an SATA DVD burner.

Chances are, you're only going to do a data backup, so, make duplicates of all your installer CDs and DVDs first. Make a disc with all the downloaded installers, and all the serial codes.

Then, backup the data. You may need to partition your data on the disk, and set up different backup jobs, to spread the backup across multiple DVDs. Prepare to spend a lot of time waiting.

Another disadvantage is that you can't always choose backup software, because the burner may not work with generic DVD burning software.

That all said, a DVD is very light, and easy to mail. It's a great way to make a weekly backup of a large project that can be sent off-site "just in case". It also gives your client something solid in exchange for paying their weekly invoice.

Backup to External Hard Disk or USB Flash Drive

A simple, transparent way to backup a personal computer is with an external hard drive or USB flash drive.

You don't need special software to do this - just copy the files.

The real issue is getting your files organized so all your documents are saved to the disk in one simple motion. (See organizing your files.)

Also, if you're a data-completist, you'll want to save the settings files (the .dotfiles in unix, and the hidden Application Settings in Windows).

If you wish to automate the process, some of the best software to use is "sync" software that compares the copies to the originals, and updates the copies automatically. The program I use is Allway Sync. There are others as well, but I found the interface to Allway Sync easiest to comprehend.

External hard disks have two risks. One is that the power adapter may fail. Another is that, because the drive is in a mobile case, you can drop the disk and have a head crash.

USB flash disks are less prone to damage, but it's possible to put them in your pocket, forget about them, and toss your pants into the washing machine, destroying the device.

USB flash drives also tend to be fragile because they stick out of the USB port. If you want to install it permanently, get a cheap usb extension cord at the dollar store, and tape the disk to your case.

A good backup solution for someone who isn't computer savvy is sync software, a USB flash memory drive, and the aforementioned extension cord. Set it up for them, and tell them to store their documents in only one folder.

External Hard Drive Backup Tips

If you're going to use a large external hard drive, for archival or simple backup purposes, here are the pros and cons of different cases:

External "Toaster"-type adapters

These are square blocks with a slot on top that accepts a SATA hard drive, and connects to your computer through USB or e-SATA.

The pluses are convenience, cost, and speed.

The minuses are the risk of metals shorting out the drive electronics, and a lack of heat dissipation.

External case with fan

These are the best cases - until the fan fails. Then, it's not so great. My personal experience was that the fan failed after a year.

The pluses are the fan.

The minuses are the risk of the fan failing - potentially leading to a hot hard drive.

External case without a fan

These are the second best cases. The ads say that the case is designed to pull heat away from the hard drive. It works as advertised, but, the heat must the be removed from the case. So the entire case needs sufficient ventilation.

Pros: nothing to break.

Cons: you still need to figure out a way to remove heat.

Power supplies: a universal problem

Generally, for whatever reason, the power adapters I've used with these external hard drives have generally been junk. They'll last 1 to 2 years, and fail.

There's no simple solution out there, except to buy another adapter. Make sure the adapter is on a hard surface with good circulation.

(A possible solution is to hack a PC power supply and use that to power all your electronics.)

Unix Backup Scripts with Rsync

Rsync is a good way to create and maintain a "mirror" of specific folders on your unix system. It's not good for archiving, for cloning disks, or running a "full/incremental" backup system.

What rsync does is compare two folders, and syncrhonize them.

The following command will backup my home folder to an external disk called "/media/extdisk".

rsync -a /home/johnk/ /media/extdisk

Of course, life cannot be that simple. I have some huge folders with a lot of chaff that I don't need. First, I don't want to backup my Downloads directory. Nor do I want to backup my Freenet storage, which is 10 gigs.

I also have a 24 gigabyte Music folder, but don't want to scan that every singe time I run a backup. Conversely, I want the Desktop folder backed up frequently.

The typical unix way to handle this situation is to write a backup script. Here's my script. It's stored in the target backup directory, so it's not listed. I "cd" into the directory and run the script:

#! /bin/bash
rsync -av /home/johnk/Pictures/ Pictures
rsync -av /home/johnk/Sites/ Sites
rsync -av /home/johnk/Desktop/ Desktop
rsync -av /home/johnk/Documents/ Documents

The backup takes around two minutes to scan 24 gigabytes of data and back up the few new files that appear.

There are ways to execute this script when the disk is plugged in, but they differ based on OS. The system in Linux is called udev, and it's fairly complex, and I'm learning it.

Here's an article about creating a backup that launches with udev.

Backup to Floppy

Backing data up to floppy went out with the 1990s. Hardly any computers have floppy disk drives anymore.

That's why it's important to gather your floppy disk based backups and move the data onto a hard disk.

When you do this, you'll be shocked at how much data's been lost to floppy disk degredation.

You may need to clean your floppy disk drive heads if you are a smoker. If you can't find one of those cleaning disks, you can fake it by taking a floppy disk that you aren't going to need and pouring a little alcohol on both sides of the disk. Pop it in for three or four seconds while the disks slide across the surface, then pop it out. The risk here is that the diskette will shed material before the heads get cleaned - so use a new diskette.

Floppy disks are still useful for doing system installations or restorations, so you still see them on some back-office systems.

Cloud Computing Backups

With more and more work being done "in the cloud" with web-based applications that store data on a remote server, edited through a web browser (or specialized client application), you'll want to backup the remote data locally.

The way to do this is to export the data using a tool that automates the process. For example, Google Docs Backup.

One of the nice things about Application Service Providers is that they save you from installing and updating software. The big risk is that they'll upgrade and leave your older documents unusable.

Legacy data in traditional backup scenarios is managed in two ways: one is pickling, where an entire system and software stack is retained to read the data. Another way to manage legacy data is to convert it to newer, more useful formats, or to older generic formats.

Cloud computing leaves you only the latter option. So, applications like Google Docs Backup try to convert the data to something generic.

How to Backup Email

The simplest way to backup an IMAP email account, like a Gmail or Aol Mail account, is to use desktop client software.

Two popular services, Hotmail and Yahoo Mail, don't support IMAP, so, you're kind of "out of luck" with them.

The rest seem to support IMAP.

Two popular IMAP mail clients are Outlook Express (now called Mail), and Thunderbird (from Mozilla). Both are nice because they allow you to create local files, and also save the email in industry standard formats like .eml and .mbox.

You can also script Outlook Express to do some of your dirty work.

The typical backup solution is to create local folders -- folders that aren't stored on the server -- and copy the server's data into these local folders.

There are also IMAP sync tools that copy all the data from one IMAP account to another. These very in speed, and most aren't fase enough for frequent backups, but, they can be used to copy the data over.

If you don't have a second IMAP server (and you probably don't), consider using something like Debian to set up an internal mail server that's used only to hold backups.

How to Backup Google Docs


How to Backup MySQL on a Website

Usually, a web host will give you FTP access to a directory and a web interface. A button in the web interface will produce a .ZIP file with the database contents, and you can download it via FTP.

If you have shell access, and you run a Unix at home, and develop your own website, you can use this script. It dumps the remote database, and then loads it into a local copy of your database.

#! /bin/bash

echo Dumping db to db.mysql.
echo Type your password.
ssh mysqldump -u uname '-p--remotepw---' database_name  > db.mysql
echo Loading
mysql -u root '-p----password----' database_name < db.mysql

It's really just two lines of typing, but having it scripted is nice.

How to Backup Websites


Database Backup

The correct way to backup a database is to use a "mirror" or "replica" of the database. That's a server that's running a duplicate of the database, and, perhaps also acting as a load-balancing server.

These two servers are connected by a network, and as requests come in to the main server, they are either performed on the main server, or passed on to the mirror. Update and delete operations are carried out on the main server, then executed on the mirror.

A typical scenario is to reserve a single IP address for hosting the service (on LAN). The main database has this address. It also has a second IP address for inter-database communications. The second datbase has only the inter-db communications IP address. If the main database fails, the second machine "takes over" the IP address.

This failover is combined with database replication. The free MySQL server has this feature, and is described in replication.

This provides maximum fault-tolerance.

A simpler backup that doesn't have the advantages of a mirror, is to dump the contents of a database to text files, and back those up. This is obviously a lot cheaper than purchasing a second database server.

If you opt for the latter method, make sure that you can build a database server and load the data quickly.

For archival purposes, you may want to make a database dump regularly, compress it, and have it backed up with the rest of the files.

If you're backing up a database on a website, see the article
How to Backup MySQL on a Website

database.gif13.02 KB


Computers fail, eventually.

There are hardware failures, and software failures.

Hardware is nice, and tends to fail one part at a time. So, if your system breaks, you can replace the part, and be operational again.

That's assuming that you can still purchase the part. You may have a spare, but, does it still work? Some components, like electrolytic capacitors, can age and fail.

Software failures are harder to detect, and sometimes, software failures are invisible when they happen, but manifest symptoms later, with bad data being revealed.

Full Backup

A full backup is a backup of all the files. It's used in contrast to the incremental backup, which is a backup of files changed since the last backup.

A common problem with full backups of networks or large file servers is that they take a long time. Backing up 300 gigabytes of data can take over half a day (over a 1gigabit ethernet network, to a SATA 3, RAID 5 NAS box).

So, full backups are typically scheduled to run over the weekend, when fewer people are using the network.

If there's too much data, a full backup may not be possible. The only solution is to split the file system into separate branches, and backup the branches on different days.

Full backups are performed in conjunction with incremental backups, usually scheduled to run once a day in the evening. A typical schedule is to perform one full backup each month, and then perform an incremental backup each evening.

Generally, it's bad to schedule full backups that fall on the 1st, last, and 15th day of the month, because those are "paydays" and it's possible that accountants may need to use the computers. (I think that mean the second and third weekends are best.) That said, backups are important enough to run even if someone's working on the weekend.

Incremental Backups

An incremental backup is a backup of all the files that have changed since the last backup. Typically, you perform a full backup, then a series of incremental backups.

You can perform full and incremental backups using tools like BackupExec or NTBACKUP.EXE. All commercial backups can do incrementals.

Incremental backups take far less space than full backups, and also take a lot less time to perform. In some situations, it's feasible to run backups during the workday.

Restoration of files from an incremental backup are performed by restoring the latest version of a file. This is done automatically.

Generally, previous versions can also be restored, so incremental backups also serve as a way to archive changes to the file system.

In many cases, incremental backups are better for archives than full backups for archives. For one, files that are created, and then later deleted, in the interval between full backups, are not stored in the archival backup. The problem is, basically, size - because a full backup is the same size as all the files. Keeping incrementals as well requires the full size, plus space for all the incrementals.

Incremental backups after periodic full backups are the preferred way to perform backups.

Inverted, Multiple Backups with USB Flash Memory

When you have multiple computers, you might want to put your files on a USB flash disk (aka a thumb drive or jump drive), and backup data to your computer's disks.

Create a folder called "backups", and a folder within it called "usb", and copy your files into there. If you have automatic sync software, you can set it to backup the data frequently. Hard disks are so fast you won't even notice the backups.

Set this up on all your computers, and your risk of data loss is nearly zero.

Just let the data "settle down" and force a file copy before removing the USB flash memory.

You could also set up a similar scheme with a USB external disk. The only issue is that moving hard disks tends to damage them and lead to data loss.

If you have important data, consider using encryption software. Losing the USB flash drive with your vital data would be bad.

Restore Specific Files from a Huge .BKF NTBackup.exe file

I like using NTBackup.exe on the old VMs, but discovered that if you don't keep up on the backup rotations, you will have a very hard time doing restores. The NTBackup.exe restore doesn't make it easy to restore all incrementals of a folder.

It turns out there's a sideways solution with Unix. Copy the huge BKF file to a Linux computer. If you don't have one, use Virtual Box and set one up in a VM.

Next, download the attached application and build it. (Again, if you don't know - you'll have to find a tutorial.)

It's also here:

The command I used was like this:

./mtftar -f inputfile.bkf -o outputfile.tar

(Except I used the full path, and had the input and outputs on different disks, for speed.)

Then you use tar to extract a specific folder:

tar xvf outputfile.tar "F\:/path/to/restore"

The quotes help, as does escaping the colon. The tar file contains the DOS drive letter, unfortunately. The backslashes were converted to regular slashes.

Watch the names scroll up the screen. That sure beats using a mouse and clicking on icons! You might want to redirect that output to a file, so you can see what it restored, to check if there were any files overwritten.

mtftar.tar_.gz16.52 KB

Secondary Backups

It's a good idea to run two sets of backups. For one, it's possible that the backup software can fail, leaving some data unsaved. It's also difficult to check the backups every day, so it's likely that some minor glitch could lead to several days without backups. You could run out of space on the backup device. A device may go offline and stay offline for no discernable reason.

Going more than one or two days without a functioning incremental backup is unacceptable. As more work is lost, there's a "network effect" where people depend on other people's work, and you have to involve more people in the disaster recovery effort.

A cheap way to avoid this problem is to run two sets of backups - a primary and a secondary backup - with the full backups staggered, and with longer runs of incremental backups on the secondary backup. Store the secondary backup on a different device (a hard disk in your PC is a good place).

This way, if the system fails on the primary, you can use the secondary to recover. If the secondary fails, you have the primary.

In my experience, you can run two backups and check them twice a week, and there is never a situation where both backups are failing, but there's occasionally a problem with one of the backups.

Subversion (svn) as a Backup

If you're programming (or managing programmers), you can use Subversion or any other revision control system as a backup.

If you're not using a revision control system at all, it's a good idea to start immediately.

Not only does the repository (the system where the code is stored) a backup of the code - it's also a way to roll back changes to your code. All the popular systems also enable team programming over the internet.

The few hours spent learning the system pay off many times over in saved time and mitigated risk.

Be Specific; The Inner Platform Effect

Here are some choice words about seemingly perpetual problems that emerge in software development.

The Inner-Platform Effect is the tendency of software architects to create a system so customizable as to become a poor replica of the software development platform they are using.

...In the database world, developers are sometimes tempted to bypass the RDBMS, for example by storing everything in one big table with two columns labeled key and value. While this allows the developer to break out from the rigid structure imposed by a relational database, it loses out on all the benefits, since all of the work that could be done efficiently by the RDBMS is forced onto the application instead.

In computing, the second-system effect or sometimes the second-system syndrome refers to the tendency to design the successor to a relatively small, elegant, and successful system as an elephantine, feature-laden monstrosity.

Burned a DVD but it Won't Play in the DVD Player

First, you have to assume that a given disc or file will not play on the target device.

I know that sounds stupid. It doesn't make sense because “in the real world” you can buy a DVD and play it in a player.

The problem is that the computer sphere is different, and there's no guarantee.

The DVD sphere is dominated by an institute called Fraunhofer, which operates a consortium called MPEG. So all DVD players conform to MPEG. Fraunhofer plus associated companies are a kind of cartel.

The computer market for video is not a cartel. There are competing companies: Apple, Microsoft, Real Networks, Fraunhofer, Sorenson, and a bunch of other companies I don't know. There are also companies like Adobe, Sony, Canon, and chipmakers like Zoran. There are open source projects like Ogg.

Each company has one or more “codecs” or encoders and encoding schemes. These encoding schemes are mutually incompatible with each other. Each company wants to monopolize some video, whether it's online, broadcast, or whatever. They want to dominate a niche, and grow it to overtake the larger market (and become the next Fraunhofer, licensing products to their competitors).

Also, popular encoders like MPEG-2 (a Fraunhofer encoding) have multiple vendors. Each vendor pays a little money to Fraunhofer for the technology, but they make their products independently.

Again, each codec company wants to position itself to dominate a niche. Presumably, they each wish to sell out to a larger company with a bigger market, and be the encoder for that larger niche. (I may be wrong here.)

Each big company, like Apple or Microsoft, has an interest in playing (almost) everything, but producing video that plays well only on their own technology, but not the competing technology. They make a concession to the MPEG and generally will be able to produce something that might work with DVD players.

It's somewhat like the problem with competing mobile phone carriers : a few large players competing against each other create a wide diversity of small incompatibilities to get customers to switch out of frustration.

The way I've dealt with the problem of competing systems is this:

To extract DVD video, copy the DVD video files to a hard disk, and use AviDeMux. It reads many formats, including the VOB files on DVDs.

On Macs, use Apple's Quicktime. On Windows, use Windows Media. On Linux, use MJPEG. Edit the video, and the produce a video file using Apple, Windows Media, or MJPEG (or Ogg), respectively. On Windows, I make a huge WMV file at the best quality.

Then author a DVD using DVDFlick, an open source product based on FFMPEG, to produce DVD files. FFMPEG seems to read a lot of formats, and produces an MPEG-2 file that works with DVD players. It seems to work better than the Apple and Windows options.

Then use Nero or Sonic or another DVD burning software to write out the DVD.

Then, for each burned copy, watch it on a DVD player, not the computer, to make sure it plays. The first copy gets watched all the way through to look for problems that cause playback to fail.

C# and Comparison

This is a link to an awesome resource: a "rosetta stone" comparing VB and C#. Totally useful as a general reference, too. One of these days, we need columns for PHP, D, C++, Javascript, etc.

Careful File Copy

This area of the site was getting really chatty, so I've removed it from the Software book, and moved it under DIY notes. The remainder of the experience will be put on the blog. Consider all the material on this page obsolete.


I was asked to help migrate a large batch of ArcMac GIS files to a new server. The problems: the files contain references to other files, and all those files must also be copied. Also, these files are mixed in with other files, not pertinent to GIS, on a single server. To manage growth, it's necessary to move the GIS files out.

Also, it's not a simple file copy. ArcMap can store the references in absolute or relative form. At this office they were stored as absolute paths, because that's more reliable. Thus, it's necessary to flip this bit to "relative," copy it over, and then re-flip it back to "absolute."

Due to the large number of files, and the slow speed of ArcMap, I decided to try and script the process. This sub-site details some of what I've learned in the process.

For a little more info:

Part 1
These are the product of the first iteration of this project. It succeeded in some ways, but failed in other ways. It succeded in processing around 150 files before it crashed and failed to process the remaning 1,400 or so.

I concluded that a longer-term project was feasible, due to the tedious and slow nature of this task. (That is, the tedium of copying files exceeded the tedium of reading hundreds of pages of VB help docs, which use "enterprise"-style code examples that usually don't do anything useful, as presented.)

The code

Taken together, these scripts form an almost-functional system. Some of the scripts are installed in an Excel document, and others are installed into the Normal.mxt template in ArcMap. The perl file copier script should be run as a scheduled task. It runs under ActivePerl.

The big problems, so far: VBA doesn't handle OLE server timeouts well; ArcMap chokes on some files; the scripts use the IMxDocument interface instead of the IMapDocument, which might be faster; the scripts pause for one minute while the mxd file loads, instead of polling the app to see if the file is loaded.

The small problems, so far: it'd be better to have the file copier written in VBA; the file format for the manifest files (generated by a script from the esri dev site) should be written for computer processing; using Excel as the process db is kind of cheesy.

Being a noob, I didn't realize that Microsoft's idea of "Automation" was not very thorough. OLE automation, as implemented with Excel and ArcMap, isn't stable enough to do real batch processing. With VB (not VBA) driving ArcMap, I suspect it's possible, but ArcMap will still not provide good error handling.

The code below has the following interesting features:


The LayerSourceArray code from the ESRI dev site.

about using multiple interfaces via scripting (you don't)

sapphos.bas.txt1.29 KB KB
FileBatcher.cls.txt2.82 KB
FileSystemScanner.cls.txt1.56 KB
main.bas.txt4.84 KB

More VBA Sample Code

Here's some more code to use.

Sub test()
    Dim pDoc As IDocument
    Dim pApp As IApplication
    Set pDoc = New MxDocument
    Set pApp = pDoc.Parent
    pApp.Visible = True
    pApp.OpenDocument ("G:\1217\1217-014\GISFiles\SEIFiles\ArcGISProjects\FieldTransects2.mxd")
End Sub

Sub setRelativePaths()
    Dim pMxDoc As IMxDocument
    Set pMxDoc = ThisDocument
    pMxDoc.RelativePaths = True
End Sub

#! perl

use strict;
use Win32::OLE qw(in with);
use Win32::OLE::Const 'ESRI ArcMapUI Object Library';
use Data::Dumper;

# my $class = 'esriCarto.IMapDocument';
# my $class = 'esriArcMap.Application';
# my $class = 'esriFramework.IApplication';
# 'esriArcMapUI.MxDocument'

# print Dumper( Win32::OLE::Const->Load('ESRI ArcMapUI Object Library') );

my $pDoc = Win32::OLE->new( 'esriArcMapUI.MxDocument', 'Shutdown' ); # || die Win32::OLE->LastError()." no $class";

print Dumper( $pDoc );

my $pApp = $pDoc->Parent();
$pApp->{Visible} = 1;

print Dumper( $pApp );



$pApp->Visible = 1;
$pApp->OpenDocument( '' );

Private Sub test()
Dim pDoc As IDocument
Dim pMxDoc As IMxDocument
Dim pApp As esriFramework.IApplication
Dim pDocDS As IDocumentDatasets
Dim pEnumDS As IEnumDataset
Dim pDS As IDataset
Dim pWS As IWorkspace

    ' get a ref to a new ArcMap application
    Set pDoc = New MxDocument
    Set pApp = pDoc.Parent

    ' Loop thru your .mxd documents here

        ' Open an existing document
        pApp.OpenDocument "c:\MyMap.mxd"
        Set pMxDoc = pApp.Document

        ' Iterate thru the datasets and display details
        Set pDocDS = pMxDoc
        Set pEnumDS = pDocDS.Datasets
        Set pDS = pEnumDS.Next
        While Not pDS Is Nothing
            On Error Resume Next
            Set pWS = pDS.Workspace
            If Err.Number = 0 Then
                Debug.Print pDS.Workspace.PathName + " : " + pDS.Name
                Debug.Print pDS.BrowseName + " : Error with datasource"
            End If
            On Error GoTo 0
            Set pDS = pEnumDS.Next

    ' End of you loop

    ' Shut down the ArcMap application

End Sub


Sub muliplemxds()
  Dim sDir As String
  Dim sFile As String
  Dim DocPath As String
    sDir = "C:\Myfolder\TestFolder\"
    sFile = Dir(sDir & "*.mxd", vbNormal)

Do While sFile <> ""
        DocPath = sDir & sFile
        OpenMXDDoc DocPath
        sFile = Dir

End Sub
Private Sub OpenMXDDoc(sFileName As String)
    On Error Resume Next
    Dim pDoc As IMapDocument
    Set pDoc = New MapDocument
    pDoc.Open sFileName
    Documentation pDoc
    Set pDoc = Nothing
End Sub
Private Sub Documentation(pMxDoc As IMapDocument)
 Dim mapcount As Long, LayerCount As Long, text As String
 text = ""
   Dim pLayer As ILayer
   Dim pFL As IFeatureLayer
   Dim pRL As IRasterLayer
   Dim pFC As IFeatureClass
   Dim pDS As IDataset
   Dim pMap As IMap
    text = text & vbCrLf & pMxDoc.DocumentFilename
   For mapcount = 0 To pMxDoc.mapcount - 1
        Set pMap = pMxDoc.Map(mapcount)
            For LayerCount = 0 To pMap.LayerCount - 1
            Set pLayer = pMap.Layer(LayerCount)
            If TypeOf pLayer Is IFeatureLayer Then
              Set pFL = pLayer
              Set pFC = pFL.FeatureClass
              Set pDS = pFC
              text = text & vbCrLf & pFC.AliasName & vbCrLf & pDS.BrowseName & vbCrLf & pDS.Workspace.PathName
            ElseIf TypeOf pLayer Is IRasterLayer Then
              Set pRL = pLayer
              text = text & vbCrLf & pRL.FilePath
              text = text & vbCrLf &
              End If
    WriteToTextFile "C:\textfile.txt", text
End Sub
Sub WriteToTextFile(sFileName As String, text As String)
    Dim fso
    Set fso = CreateObject("Scripting.FileSystemObject")
    'Set fso = New Scripting.FileSystemObject
    Dim ts
    'Create File if doesn't exist, if it does, append to the current File
    Set ts = fso.OpenTextFile(sFileName, 8, True)
    ts.WriteLine text
    Set ts = Nothing
    Set fso = Nothing

End Sub


use Win32::OLE;
my $class = "esriGeoprocessing.GpDispatch.1";
my $gp = Win32::OLE->new($class) || die "Could not create a COM $class object";
$gp->{overwriteoutput} = 1;
print $gp->{overwriteoutput};


OLE/ActiveX Scripting Notes

I'm still working on this. These are just notes, and I'm a noob.

The ESRI ArcObjects don't fully support scripting. They support some basic level of scripting, but they don't fully support scripting with via contemporary OLE Automation, which is what Perl and other languages use.

Historically, there are three phases of COM/OLE that should help explain this situation a little.

First is COM. COM is a way to factor applications into objects that can be used across languages. Normally, you're constrained by the language.

Second is DCOM or OLE. OLE, and later, Distributed COM allowed for the objects to be located on different computers, or within another application. You could issue a method call to a remote program. The technology to do this involved "interfaces". An interface, in this situation, is a lightweight object that communicates with a remote concrete class, aka, coclass. The interface presents a "local face" for the remote object. To access the objects, you "instantiate an interface." Complex objects typically implement several interfaces, and, to access such an object, you needed to instantiate each interface separately, and then set the instance to the object.

Dim foo as IFooThing
Set foo = CreateObject("Foo.FooThing")

Dim bar as IApplication
Set bar = foo

The first two lines set up an object called foo that is accessing Foo.FooThing via the IFooThing interface. The last two lines set up the bar object to also access Foo.FooThing, but via an IApplication interface.

Third is ActiveX and scripting. This is where we are today. Scripting requires a single interface to the entire object. ActiveX objects havea single interface to the entire object, called IDispatcher.

Historically, much of the ESRI applications are stuck back in the second period, where the objects lack an IDispatcher. Thus, ArcGIS apps are difficult to write using scripting tools that expect it.

The alternatives are to use VB for Applications, .NET, Java, C++, and VB.

I'm not certain if Python has support for COM interfaces. I believe it does, according to what some sites say.

Part 2: a VB.NET Version of this Project

After a while, it became obvious that there was no way to drive the ArcMap application from Excel -- timeouts from errors wouldn't get handled, so bad runs would hang.

A real app could raise errors on timeouts, so, I had to learn VB OLE programming. Fortunately there's a free version of VB called VB Express Edition. It's a complete VB environment, that uses .NET. Unfortunately, there aren't references for the old VB classes included. .NET is, in parts, a bit more complex than VB - it's a victim of feature-itis. There are also fewer VB.NET tutorials out there.

Here's a diagram of the "new" system, which is, mostly, going to be an iteration of the "old"system.

The app is broken into three parts. One part manages a list of files. One part is a bunch of "scripts" that do the actual work of analyzing, copying, and deleting files. One part is a scheduler that will run the scripts only at specified times, so that it won't interrupt the normal workday.

File Batching
This code fits into the larger goal of a project that will reliably run an application on a set of files, over the course of several nights.

The first thing I've written, so far, is something that will scan the file system for file names, to create a "batch". The batch is stored in a Microsoft Access .mdb file.

The coolest feature is that you don't need Access to run it. It creates the .mdb file from scratch, and inserts data into it.

Another cool feature is the call to System.IO.Directory.GetFiles. That does all the scanning that, in the original project, required custom code.

This is very alpha code, but, it might help someone out there.


Imports System.Data
Imports system.Data.SqlClient

Public Class FileBatch

    Private Const StatusNone = 0
    Private Const StatusProcessed = 1
    Private Const StatusSkip = 2

    Private Sub CreateNewDatabase(ByVal dbPath As String)
        ' delete the file first
        If System.IO.File.Exists(dbPath) = True Then
        End If

        Dim dbCatalog As New ADOX.Catalog()
        dbCatalog.Create("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & dbPath)

        Dim objFirstTable As New ADOX.Table()
        objFirstTable.Name = "FileBatch"
        objFirstTable.Columns.Append("File", ADOX.DataTypeEnum.adLongVarWChar, 1024)
        objFirstTable.Columns.Append("DestinationFile", ADOX.DataTypeEnum.adLongVarWChar, 1024)
        objFirstTable.Columns.Append("Status", ADOX.DataTypeEnum.adInteger)
        objFirstTable.Columns.Append("ProcessingDate", ADOX.DataTypeEnum.adDate)
        objFirstTable.Columns.Append("Comment", ADOX.DataTypeEnum.adVarWChar, 255)
        objFirstTable.Keys.Append("PK_File", 1, "File")


        dbCatalog = Nothing
        objFirstTable = Nothing
    End Sub

    Public Function CreateBatch(ByVal dbPath As String, _
        ByVal pathStart As String, _
        ByVal ext As String, _
        Optional ByVal statusBox As TextBox = Nothing)
        Dim ar, element


        If statusBox IsNot Nothing Then
            statusBox.Text = "Scanning for *." & ext & " in " & pathStart & "."
        End If

        ar = System.IO.Directory.GetFiles(pathStart, "*." & ext, IO.SearchOption.AllDirectories)

        Dim cs
        Dim conn As OleDb.OleDbConnection
        Dim command As OleDb.OleDbCommand
        Dim sql As String

        cs = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & dbPath
        conn = New OleDb.OleDbConnection(cs)

        For Each element In ar
            sql = "INSERT INTO FileBatch (File,DestinationFile,Status,ProcessingDate,Comment) VALUES ('" _
                  & element & "','',0,'1/1/1899','')"
            ' Console.WriteLine(sql)
            command = New OleDb.OleDbCommand()
            With command
                .Connection = conn
                .CommandText = sql
            End With

        CreateBatch = 1
    End Function

End Class

Here's the code that calls it (from a form button):

    Private Sub Button1_Click(ByVal sender As System.Object, _
        ByVal e As System.EventArgs) Handles Button1.Click
        Dim fb As FileBatch
        fb = New FileBatch
        fb.CreateBatch("C:\tmp\text.mdb", "C:\Documents and Settings\johnkuser\", "jpg", Me.StatusMessage)
    End Sub


Form1.vb.txt343 bytes
Form1.Designer.vb.txt2.24 KB
FileBatch.vb.txt2.49 KB
filebatcher.jpg8.02 KB

Some COM and .NET Notes

This document explains some terminology used on other pags.

A technology layered on OLE that supports a method, IDispatch(), that executes method calls by name (by a string argument). IDispatch() solved the problem of scripting languages being late bound, and not able to handle multiple interfaces. ActiveX also covered other technical things, but the IDispatch feature is relevant to this topic.
A group of classes. The classes generally work together, and form a namespace. Analagous to a Java package. The .NET assemblies are analagous to the Java class libraries.
CLR, Common Language Runtime
A "virtual machine" that executes programs coded in CL, a platform-neutral assembly language produced by compilers. The CLR is also called a "managed environment" because the virtual machine takes care of many runtime issues like allocating memory.
COM - Component Object Model
Microsoft's object technology that allows code objects written in different languages to interact with each other. The idea was that you could instantiate an object written in C++ from within VB.
OLE - Object Linking and Embedding
A technology layered on COM that defined how independently running objects would interact with each other. One example is how code in MS Excel can execute a macro in MS Word.
Late Binding - Dynamic Typing
The type of an object is not known until it is used. This contrasts with early binding, or static typing, where you declare that an object is of a specific type, first, then use it. Early binding in the COM environment is used when you declare that an object uses a specific interface. That allows the compiler to check that your method calls conform to the interface.
See CLR. Managed code is any code that runs within the CLR. The execution is "managed" because the CLR takes care of things like memory allocation and threads.
Multiple Interfaces
The technique used by MS VB and COM to implement objects. An object implements an interface, and may implement more than one. To interact with the object, you instantiate the object with the specific interface, and that defines how you interact with it.

Change Web Host Company Without Downtime (Linux or BSD oriented)

This outlines how to change web hosts with minimal downtime. It won't go step by step, or explain, too much.

I'm using Hurricane Electric as my web host. I've been happy with them, especially after trying other web hosts that didn't deliver the level of performance I needed. (I'm a return customer, having used them back in the 90s.)

Gather all passwords, making sure you can get into your accounts to manage domains, web server files, databases. Get these all in a single text file, for convenience.

Move the files over. If you can, use a tool like rsync, and run an rsync server on the originating server (or on the computer with a staging copy of the site). If you have shell access, you can write a script to sync the files. Here's a bit of my script to sync:

#! /bin/bash
# this syncs everything except images
# necessary because rsync dies when there are too many files
for fn in action_icons admin atom audio authors calendar
        rsync -vr  rsync://${fn}/ public_html/$fn/

Move the database over. Again, if you have shell access, you can do this with a command like this:

echo "getting database from remote"
ssh ./dump_mydb_org | mysql -pasdfsdf mydb

dump_mydb_org is a script that calls mysqldump with the correct username and password to dump mydb.

To get the application running on the new server, edit your local HOSTS file and create a line for your website. Typically, your website is hosted on and What you will do is override the DNS, and create a new record for Set to the new IP address.


Going forward, will point to the new server. You can now get the application working. Usually, that means altering configuration files so they can get to the new database.

If you have web stats, make sure they are working.

If you have to make updates to the old site, keep doing so, until Friday afternoon. On Friday afternoon, log into the new server, and run both sync scripts to sync the files and the database.

Then, alter the DNS records so your domains point to the new server.

In two to three days, nearly all the DNS records for your site will change over to the new server. On Monday morning, the new site should be getting all the traffic. From Friday evening to Monday morning, you should avoid updating the database. If you must, then, figure out some way to make sure you're only touching the new website and database. Maybe put a file on the new server and read it through the web browser.

On Monday, download the logs from the old site. Shut it down, but don't delete anything. You might need to switch back if the host turns out to suck.

Changing Windows 2000 Professional to Windows 2000 Server

The main reason to do this is to allow more than 10 clients to connect to your computer. Aside from that, Win2k Pro doesn't come with all the applications and services that Win 2k Server includes.

Info stolen from:

Week Ending March 30, 2002

Change Windows 2000 Pro To Windows 2000 Server with Freeware Util
NTSwitch is a small freeware program that allows you to turn an existing NT Workstation or Windows 2000 Professional installation into an NT Server or a Windows 2000 Server environment.

It's well-known that Workstation and Server environments are virtually identical. The operating system decides which "flavor" to run in based on two registry values:

* HKLM\SYSTEM\CurrentControlSet\Control\ProductOptions - ProductType [REG_SZ]
* HKLM\SYSTEM\Setup - SystemPrefix [REG_BINARY 8 bytes]

ProductType is "ServerNT" or "LanmanNT" for servers, and "WinNT" for workstations. The third bit in the last byte of the SystemPrefix value is set for servers, and cleared for workstations.

Since the release of NT4, Microsoft has taken measures to keep the user from changing these registry values. The operating system has two watcher threads that revert any changes made to these two registry settings, as well as warn the user about "tampering".

The good guys at SYSInternals have supposedly created an application called NTTune. They did not release it to the public, but only to the press - their intent was to demonstrate the fact that there's really no difference between Server and Workstation. However, they did not make their utility publicly available. The application disabled the system threads thus letting the user change the aforementioned registry values.

The public is curious - people came up with a way of changing these settings without NTTune. Details are here. It involves hacking the NTOSKRNL.EXE executable so that the watchdogs are looking at some other registry setting. While this works, it's definitely not for the faint at heart.

Our utility, NTSwitch, is not as slick as NTTune - it does not disable the system threads. It's not as horrible as the NTOSKRNL.EXE hack either.

Our approach is the following:

* Backup the SYSTEM hive of the registry using the registry API.
* Edit the information contained in the backup file.
* Restore the registry from the backup.
* Reboot the computer so that the changes can take effect.

A quick-and-dirty hack. It works, and it's at least as safe as the two previous solutions. We're giving it away for free. Go here to download it. The readme.txt contained in the zip file might have some late-breaking information, be sure to read it.

Other links:

Cheapskate Developers Mobile Phone Tips

I feel lame when it comes to mobile phone hacking because I'm so far behind the state of the art, by at least five or more years. The only good thing about this is that, generally, only games have taken off on phones, leaving the universe of practical applications almost untouched. This is a newb article, so, if you're experience, go away :)

The problem with programming J2ME devices is that there are so many devices, and they generally support a different subset of J2ME features or APIs (called JSRs). I recently got an LG600g and found out that it's a junk phone for Java hacking. It doesn't support many JSRs.

I wanted to try writing code to extract data from the PIM. No such luck, because the PIM JSRs aren't supported on this phone. In fact, few JSR are supported. A little sleuthing revealed that this phone looks like an LG KP210.

LG documents the JSRs supported by the KP210 on their developer site, There, you can download SDKs for LG J2ME development.

A similarly priced (cheap) phone called the Motorola i290 from Boost mobile is a different story. If you go to the Moto site, there is a complete SDK for the i290, and a lot of different JSRs are supported. Moto is very programmer-friendly, and they have an exemplary site.

Regardless, I'm stuck with the LG until the minutes run down. So, I'll program for that, for now. The limitations are a challenge.

Comment Styles for C-Style Code

If there's anything that annoys people more than funky indentation, it's bad comments. I don't mean about the code, but in the code.

function name()
/* Once upon a time, all my comments
were inside the functions.

It seemed to makse sense, but there's something that sucks about having to scroll more to start reading code.

/* Moving it up above the function seems to help!
So i did this for a while.
function name()

Lately, all the languages are getting automatic documentation generation. They use comments like this:

* The code comments here get turned into web pages.
* I like how there's a little extra whitespace above and below.
* And the stars are a pain to keep adding, even with editor support.
function name()

But these docs look like really decent docs. In Perl:

# This function does nothing at all.
sub name

Again, there's all that extra whitespace. It gives the eyes a break.

As you can see, the main trends here are to move the comments out of the function, and to add more visual cues in the comments.

Compile ffmpeg from sources on Ubuntu

For some legal reasons, Ubuntu does not ship with some important features in ffmpeg enabled. It appears that support for faac AAC encoding is stripped. The older tutorials for building from source don't seem to work.

So, I went to the original sources.

I don't have the full instructions yet, but here are some tips.

Use the latest ffmpeg.

Use the latest x264.

Configure both without the --enable-shared flag.
Configure x264 with --enable-pic.
Configure ffmpet without the --disable-static flag.

I suspect there are better arrangements so that everything can make a shared library... but often you just need to get ffmpeg running.

Connecting to Network Printers

Suppose you go onto a foreign network and need to print. There's no network administrator around. How do you install the printer?

First, go up to the printer and push the "menu" button. There are "up and down" buttons to flip through menu items. One of the items will be "print configuration" or something like that -- use that and print the configuration. (You can take this back to your computer, by the way.)

On the printout should be a section with the heading "network" or "tcp/ip" or something like that. Look for a line like "IP address".

On your computer, click "add printer". Depending on whether you're on a Mac or Windows, the setup process is different. (Since I don't have either here, I can't go into detail.)

On Windows, you need to call the printer a "Local Printer", not a network printer -- yes, it seems weird, but that's what you do. Set the port type to TCP/IP, and set the IP address to the printer's IP address.

On a Mac you can select the IP printer and specify the IP address.

At this point you can probably allow the computer to automatically choose a driver. It'll take a while, especially over the internet.

If you're impatient, you can do what I do, and before the entire process, you use Google to find the printer driver for the printer. Install it first. Then, set up the printer. (If it looks like a generic Postscript or PCL driver, you can force it to use the built-in drivers. There's a little risk that it won't work with cheaper printers, though.)

Convert Text to HTML PHP Function

How many times has this wheel been reinvented? According to Google searches, not enough - because I couldn't find a good one. Over the years, I've built this wheel a few times, so, here goes again. This is a lot better than the stock nl2br() function.

The attached code and test files show it off, and only a description follows.

The text is converted by analyzing it line by line, and building up an array that contains metadata about the document. The metadata describes each line: is it long? Is it blank? Does it look like a quote? The metadata is analyzed to determine paragraph groupings.

This differs from the typical solution of using regular expressions to add HTML code to text. For one, we try not to manipulate the text in place. Rather, we simply "look at" the text, and "notice" features. Later, we analyze the features to determine what tags to insert.

This technique works well because a lot of formatting information is embedded in the layout of the text. By preserving the layout, we can guess what the formatter intended. Also, by allowing for multiple passes over the text, we can refine the metadata.

For example, we could detect if one of the first few lines contains a line that's capitalized like a title. If so, we can assume it's a title, and add that metadata. Then, we can quickly look one line below that and see if it looks like a byline, and if so, add that metadata.

What you detect depends on your data. This function's being written to convert text email messages into HTML, for easier reading on small screens, so bylines and title aren't that important, but getting quoted text right is.

I've left linking to another function, and escaping characters to htmlspecialchars().

The paradump() function is not related to all this - it's just a way to view the text alongside the metadata.

index.php.txt3.31 KB
test.txt1.45 KB
text.txt916 bytes

Convert Web Pages into Kindle "Books" (Documents)

This script below will accept a URL parameter, download the HTML, convert it to a .mobi file with kindlegen, and copy the file onto your Kindle. It works on Ubuntu, but can be altered to work in your environment. It's written in Perl, and requires kindlegen and wget. You can get kindlegen from Amazon's website, and wget is in your repository.

The only "trick" it has is reading the document's title, and using that as the document's filename. That should help avoid problems with files overwriting each other, to some extent.

$KINDLE is the documents directory on your Kindle. If you're using another Linux distro, it might appear in another directory. $KINDLEGEN is the path to the kindlegen command.

#! /usr/bin/perl

$KINDLE = '/media/Kindle/documents';
$KINDLEGEN = '/home/johnk/bin/kindlegen';

use File::Copy;

$url = $ARGV[0];

system("/usr/bin/wget -O /tmp/kindle.html $url");  

open FH, '</tmp/kindle.html';
@lines = <FH>;
close FH;

@titles = grep { $_ =~ /<title>/i } @lines;
$titles[0] =~ m#.*<title>(.+)</title>.*#i;
$text = $1;
$text = 'index' if (! $text);

print "title is $text\n";

$text =~ s/[^a-zA-Z0-9 ]//g;
$text =~ s/\s/-/g;
$text = lc($text);
$text = substr $text,0,30;

$filename = $text.'.html';
$mobifilename = $text.'.mobi';
print "filename is $filename\n";
print "mobifilename is $mobifilename\n";

rename("/tmp/kindle.html", "/tmp/$filename");

system("$KINDLEGEN /tmp/$filename");

copy("/tmp/$mobifilename", "$KINDLE/$mobifilename") or die "Copy failed: $!";

Converting Time or Datetime to UTC in Python

This seems so basic, it's almost embarrassing to publish, but this showed up a few times on Stackexchange. I had trouble figuring it out, too, partly because the Python docs are so lengthy.

The scenario: you have a textual timestamp with a timezone, and need to convert it to UTC. I had the former in and email date header, and needed it printed in UTC for the envelope.

The input looks like this:
Tue, 17 Mar 2009 18:57:55 -0300

from dateutil.parser import parse
from datetime import timezone, datetime, timedelta

t = parse("Tue, 17 Mar 2009 18:57:55 -0300")

## first way
tu = t.astimezone(timezone.utc)

## second way
tu = datetime.utcfromtimestamp(t.timestamp())

## and as text
text = tu.strftime("%c")

The main difference between the first and second ways is that the first way strips off the timezone info (tzinfo), so it become a "naive" datetime. The second way sets the tzinfo to utc, which is better.

Also, the Date lines in email headers varies, and taking a slice of [6:37] will chop off timezone markings like "(GMT-08:00)" that cause the parser to barf.

Create SQL tables from CSV headers

Not sure where this goes, but it's a page that will generate MySQL code from the header line from a CSV file.

Link to mini app.

Debian Exim: how to Whitelist a host or IP that is in a blacklist

Exim4's docs need some work, especially the split config.

First, you need to make a new config file /etc/exim4/conf.d/main/000_localmacros

Then, in the file:


Or whatever networks you need to allow to relay.

Then /etc/init.d/exim4 reload

Deleting a Windows User You Can't See

Windows XP's Users control panel doesn't show all the users. I had to delete the "postgres" user to reinstall postgresql on my comp. To do this, I had to run this command:


To see all the users type NET USER.

Django 1.8 Tutorial

This is a series of tutorials about Django 1.8. I'm a Django newbie and these are, more or less, my learning notes. These differ a little from other tutorials because it's assumed you know how to program, but don't know Django well, and you have worked through other tutorials.

Older tutorials focus on writing code using the "functional style" Django, but these all try to use the classes, and favor generic idioms.

Django 1.8 Tutorial - 1. A Minimal Application Made Using Generic Class Based Views

I'm not a Django expert. I'm a Django novice writing this document to help other novices. There are probably errors, and I welcome corrections.

This is an intermediate level document for people who know how to program, are fairly comfortable with Python, have done one or two Django tutorials, and know the Django file layout, but haven't really "gotten" Django or the generic View classes.

We will create a small web application with all the CRUD operations in around 120 lines. The tutorial emphasizes using the most generic names.

Sources are attached, below.

Generic View Classes, the docs, and how to read them

The generic View classes are the best thing in Django. They allow you to create views with minimal work - these classes take care of all the usual features: displaying lists of objects, object details, forms to edit objects, forms to update objects, and interactions to delete objects.

They are called: ListView, DetailView, CreateView, UpdateView, and DeleteView.

The docs are confusing. Object oriented design uses inheritance hierarchies. A method that is available in a class may be defined in the class, or may be defined higher up in the hierarhcy.

In Django's case, most of the generic view class behavior is defined in mixin classes. The documentation also follows the inheritance hierarchy, so you don't have all the docs for a given class in the class - it's in the docs for the parent class or mixin class that provides a method or behavior.

To read the docs, you need to dig around the docs, particularly in the docs for the mixins.

So, when you're puzzling over what a View does, drill down into the Mixins. They are all listed in the section titled "Ancestors (MRO)." "MRO" means "method resolution order." This lists the order in which the methods inherited from mixins and the parent class are resolved.

(What's a mixin? It's like a library of code that's included into a class. Django has view classes, and they pull in code from various mixins, each one providing one or more methods or functions. These mixins are re-used across the views.)

Note: There's a simplified documentation reader named CCBV. It will make sense after you've read the regular docs, and written some code.

A Sample Model: Comments
For this tutorial, we're going to start making a simple comment system. This first try won't behave like a real comment system, but we'll be able to create, edit, and delete something resembling comments. To start it:

./ startapp comment

Then, you add 'comment' to the INSTALLED_APPS in

There is a single model, Comment. It's related to the default Django user auth system:

from django.db import models
from django.contrib import auth
from django.core.urlresolvers import reverse

class Comment(models.Model):
    author = models.ForeignKey(auth.models.User)
    title = models.TextField(max_length=200)
    text = models.TextField(max_length=1024)

    def get_absolute_url(self):
       return reverse('comment_detail', args=[str(])

get_absolute_url will be explained later, but it allows some views to automatically send you to the detail view of the object after creating or updating the object.

Enable the Default Admin
After creating the model, enable it in the admin, so we can add some data. I won't get into this in detail - there are a lot of good tutorials out there.

from django.contrib import admin
from models import Comment

class CommentAdmin(admin.ModelAdmin):
    pass, CommentAdmin)

Then, set up the superuser, log into the admin, and add a comment.

Generic View Classes Eliminate Forms

The CreateView and UpdateView classes allow you to create editing views.

What makes them nice is that they don't require Form classes.

What is a Form class? In Django, Form classes render forms and perform validation. They are one of the nicer Django features, because they relieve you from writing HTML forms, which are typically the most convoluted and often error-prone parts of web applications. The CreateView and UpdateView automatically create a Form object by inspecting the Model object.

In the project's, include the app's urls:

from django.conf.urls import include, url
from django.contrib import admin
import comment.urls

urlpatterns = [
    url(r'^c/', include(comment.urls)),
    url(r'^admin/', include(,

Then create the file for the app

touch comment/

This is the complete It'll be explained in detail later:

from django.conf.urls import url
import views

urlpatterns = [

Then, create

from django.views.generic import ListView, DetailView
from django.views.generic import CreateView, UpdateView, DeleteView
from django.core.urlresolvers import reverse_lazy
from comment.models import Comment

class CommentList(ListView):
    model = Comment

class CommentDetail(DetailView):
    model = Comment

class CommentCreate(CreateView):
    model = Comment
    fields = ['author', 'title', 'text']

class CommentUpdate(UpdateView):
    model = Comment
    fields = ['author', 'title', 'text']

class CommentDelete(DeleteView):
    model = Comment
    success_url = reverse_lazy('comment_list')

These two files are the heart of the application: routes and behaviors, external interface and internal state.

They won't display anything without templates. So, we need to take a little detour to explain how templates are named and found by the template system.


Templates are tagged up HTML files that display our objects.

To keep your application self-contained, you can put the templates in the app's directory. To enable this feature, in your, set 'APP_DIRS' to True:

'APP_DIRS': True,

Then create these two directories, where appname is your app's name:

For our tutorial, the directories are comment/templates/ and comment/templates/comment/. See the sources for details.

The templates go in both directories. If you use template inheritance (and you should), the base.html file goes in templates/. The rest of the application templates go in templates/appname/.

Default Names for Templates

Though you can set the name of the template via the template_name property, we're going to use the default template names.

The default names are constructed using these patterns:


The model is the lowercased name of the Model. In our case, it's "comment", so the files are named "comment_list.html" etc.

You should also create a file, 'base.html" in the app's templates:


You should stick with "base.html" because it is a convention. Here's our minimalist base.html:

# templates/base.html
        {% block content %}
        {% endblock %}

Contexts for these Templates

A "context" in Django is a dictionary of names and values that can be inserted into a template. A template cannot read any variables defined in the application; it can only read the context.

Each generic view defines values in the context, giving them generic names. For example, in the list, there's a list named "object_list" that contains the model objects retrieved from the database. In the detail view the object is named "object". The names of the objects can be changed, and additional objects can be defined in the context, but we will use the default names.

Looking at the URLs and CommentList Again

A quick perusal of and shows that the urls point to the views, so we'll discuss each pair of the url configuration and the View class, and the template that they use.

This URL configuration maps the root url to the CommentList class, and it's named "comment_list".

class CommentList(ListView):
    model = Comment

This view will render a list of comments using the comment_list.html template.

In the ListView, the context contains a list of found comments, and the name is "object_list". Here's our template, templates/comment/comment_list.html:

{% extends "base.html" %}

{% block content %}
    {% for comment in object_list %}
            {{ comment.title }} 
            <a href="{% url 'comment_detail' %}">detail...</a>
    {% endfor %}

    <a href="{% url 'comment_create' %}">Add...</a>
{% endblock %}

We loop over object_list and dispaly the title and a link to get details about the object.

Incidentally, Django will also define "comment_list" in addition to object_list. They will point to the same thing. This tutorial uses the generic names.

The tags {% url 'comment_create' %} and {% url 'comment_detail' %} will be explained later.

Generic or Specific
The Django tutorials and some other ones prefer to use the model-specific names. I tend to favor the generic names, unless there are two objects, or two lists, or other situations where a generic name doesn't work. It's idiomatic, and generic, so you can reuse the same code.


The CommentDetail extends the DetailView class, and displays the entire object. The URL includes a regex that captures a parameter named "pk", and passes that into the view.

class CommentDetail(DetailView):
    model = Comment

Explaining regexes deserves a longer article, but here's a summary of what the expression in the url does.

The following regex means "capture a string of digits; call it 'pk'." "pk" is the generic name for a primary key. Django sometimes also uses "id", but we will use "pk" because it seems to be used in more places.


^ means "the start of the string". It prevents partial matches.

$ means "the end of the string". It prevents partial matches.

( ... ) means "capture this subpattern".

?P<...> means, "name this pattern" using the name between < and >

[0-9]+ is a regex that matches strings of digits.

The / before $ forces the URL to terminate with a /. (Django can add the / to URLs automatically by setting APPEND_SLASH to true.)

The DetailView expects a single named parameter (a kwarg) named "pk". It then gets that model object, putting it in "object" in the context.

It's then displayed using the comment_detail.html template:

{% extends "base.html" %}

{% block content %}

<h2>{{ object.title }}</h2>
<p><em>by {{ }}</em></p>
<p>{{ object.text }}</p>

<a href="{% url 'comment_edit' %}">Edit</a>
<a href="{% url 'comment_delete' %}">Delete</a>

{% endblock %}

What is that {% url %} ?

In the above two templates, we've seen the use of the {% url %} tag. This tag creates URL from a URL's name. If you look at the, you'll notice that each call to url() contains a name parameter; that's the URL's name.

The {% url 'comment_list' %} "reverses" the name, and turns it back into a URL. In our app, it's "/c/".

The {% url 'comment_detail' %} and {% url 'comment_edit' %} and {% url 'comment_delete' %} all have a second parameter that specify an object. The name "pk" is just the default name for the primary key. Likewise, object is the default name of the object.

Suppose that the pk of the object is 1.

{% url 'comment_detail' %} resolves back to /c/1/.

{% url 'comment_edit' %} resolves back to /c/1/edit/.

{% url 'comment_delete' %} resolves back to /c/1/delete/.

These URLs are then embedded in A tags to form links.

The mechanism to reverse the URL uses the regex to reconstruct the URL. If you supply a value for "pk", it then replaces the named expression (?P<pk>[0-9]+) in the URL configuration.

CommentCreate and CommentUpdate

Creating and updating an object are similar, so I'll cover them together. The URLs are similar to CommentList and CommentDetail.

class CommentCreate(CreateView):
    model = Comment
    fields = ['author', 'title', 'text']

class CommentUpdate(UpdateView):
    model = Comment
    fields = ['author', 'title', 'text']

Both these classes require a property, fields, to define what fields to include in the form.

Generic View Classes Eliminate Forms

CreateView and UpdateView don't require Form classes.

What is a Form class? In Django, Form classes render forms, perform validation, and display errors. They are one of the nicer Django features, because they relieve you from writing HTML forms, which are typically the most convoluted and often error-prone parts of web applications. The CreateView and UpdateView automatically create a Form object by inspecting the Model object.

The form for CommentCreate and CommentUpdate

The Django CreateView and UpdateView share a template, named model_form.html.

In our app, CommentCreate and CommentUpdate both use the comment_form.html template, and insert an object named "form" in the context.

The CommentUpdate view defines "object" in the class, setting it to the comment being edited. The CommentCreate view doesn't define "object" - we're creating a comment from nothing, so, there is no object. This template includes some logic to check for "object", to switch between headings for creating or editing the comment.

The template is templates/comment/comment_form.html:

{% extends "base.html" %}

{% block content %}
    {% if object %}
        <h2>Edit {{ object.title }}</h2>
    {% else %}
    {% endif %}
    <form method="post">
        {% csrf_token %}
        {{ form.as_p }}
        <input type="submit" />
{% endblock %}

The CommentDelete View
The CommentDelete view shows the form in comment_confirm_delete.html, which confirms deletion. It's just a modification of comment_form.html. The view defines "object" in the context, setting it to the object. The logic in the view is a little odd: if the form is requested via GET, it'll show the template and form. If it's submitted via POST, it'll delete the object.

Thus, the user flow first requests by GET, which displays the object and a confirmation. Then, when the form is POSTed, the object will be deleted. The user is then forwarded back to the list.

The view's code is:

class CommentDelete(DeleteView):
    model = Comment
    success_url = reverse_lazy('comment_list')

The template is:

{% extends "base.html" %}

{% block content %}

<form method="post">
    {% csrf_token %}
    <p>Are you sure you want to delete {{ object.title }}?</p>
    <input type="submit" value="yes" />
    <a href="{% url 'comment_list' %}">no</a>

{% endblock %}

What is {% csrf_token %}?
CSRF means Cross Site Request Forgery. It's an attack on the website. Search for it. Putting the {% csrf_token %} in your form prevents this attack. If you don't add this tag, Django will complain.

The Comment model's get_absolute_url() and reverse()
get_absolute_url() is called by the CreateView and UpdateView views to redirect the user to a page after an object is successfully created or edited.

    def get_absolute_url(self):
       return reverse('comment_detail', args=[str(])

The reverse() function works like the {% url %} tag, taking the name of a registered URL, and inserting the PK into the URL.

In this code, which was copied from the Django tutorial, we use positional arguments and instead of The following also works:

    def get_absolute_url(self):
       return reverse('comment_detail', args=[str(])

You can also use kwargs, like this:

    def get_absolute_url(self):
        return reverse('comment_detail', kwargs={'pk': str(})

When I was learning this, I was looking for a way to use success_url to set the next url, just like we do in other views. However, I was stumped as to how it could determine the PK for a specific object. Well, I guess the answer is, "you can't set it in a class attribute like success_url." At that moment, you don't know the PK. You only know the PK when the request is made, or after the new object is saved.

pk, the Primary Key

Chances are, when you make a database, you call the primary key "id" or "TableNameID" or "tablename_id", depending on your habits of naming tables.

In Django, you can specify your primary key with similar names, but "pk" is preferred. Throughout applications, though, you will see the use of "pk" as an alias for the primary key. You will see the use of pk in the url() in urlpatterns, and also in the url tags in the templates. The generic views default to using "pk", so we should also use "pk" as the primary key.

More on Templates: Overriding Templates

Django allows you to override a template. It does this by searching for templates in different directories, so when your code says 'extend "base.html"', it searches for base.html across different directories. The normal way to override your app's base.html is to put it in the project's 'templates' directory. (This can be changed in the TEMPLATES setting, but other docs seem to do it this way.)

To enable this overriding behavior, you need to change the DIRS value in the TEMPLATES list, in your

'DIRS': [os.path.join(BASE_DIR, 'templates')],

That adds templates/ in the project's directory (the one with in it) to the template search path. Django will first look for the template in the project's templates/, then in the app's templates/.

The reason for the "base.html" convention is simple: if different apps all use "base.html" as the name for the base template, a single "base.html" file in the project's templates can be used by all the apps. Just like using "pk" and "object" and "object_list" and "*_set", it's more generic.

The attached app doesn't really look right. The widgets should be changed, and there's no CSS, and it really doesn't function like a comment system. I'll fix these for the next article.

minimal.tgz7.56 KB

Django 1.8 Tutorial - 2. Polishing the App with Static Files, CSS, JS, etc.

The last article took several hours to write, so I'm going to take a break from writing and editing for a while. These tutorial posts will still happen, but they'll be harder to read.

The previous tutorial created a "comment system", and while it was a reasonable example of using ModelForms and generic View classes, it didn't look like a real comment system. This tutorial polishes the original and makes it more like a real web app.

Here's what it looks like:

It's still not "nice", but it's getting there.

Serving Static Files

The first time I read about DJ-Static and the issue of serving static files, I was baffled. Wasn't the application going to run on a web server? They serve static files all day long.

Well, it turns out Django isn't a web server - it's an application that maps requests for URLs to functions. You may have already had this "aha moment": app servers are not web servers.

Web pages, however, are typically served from web servers, and also typically have links to images, CSS, and JS files, usually on the same server.

Django supports this through it's static file server. The big, somewhat confusing, issue is: where these static files are stored.

It turns out they can be stored in many different places. The path in the URL is *not* the path in the file system. I know that sounds weird - but think of it like the way we use templates. The server searches for templates, by name, in multiple locations in the file system. URLs to static files map to static files in the Django application, but the application server searches for them.

This searching behavior allows us to store some CSS files in the app's directory, under the 'static/' folder.

Static Files Can Override Other Static Files

My configuration will search for files in the filesystem first, in the 'static/' directory in the project root, and then search for static files in the app's 'static/'.

So, for example, if you were looking for "/static/comment/main.css", it would first look for minimal/static/comment/main.css. If that didn't exist, it would look for minimal/comment/static/comment/main.css, and deliver that.

This allows designers to override the programmers :) An app can come with some static CSS and JS files, but these files can be overridden by identically named files in the project's static/ directory.

Our static file config is this:

STATIC_URL = '/static/'

    os.path.abspath(BASE_DIR + '/static/'),

STATIC_URL is the path prefix that indicates that we want a static file.

STATICFILES_DIRS is an iterable with a list of search paths. The paths must be absolute, so we're using the BASE_DIR variable, which was set earlier in the script.

Our static file is in minimal/comment/static/comment/comment.css, and in templates, it's written as:

 href="{% static "comment/comment.css" %}" 

It's not quite "standard" yet - there should be a "css" directory - but it's close enough.

The file's contents are:

.comment {
    border: 1px solid silver;
    margin: 10px;
    padding: 10px;
    width: 300px;
label {
    display: block;

There's not much to it. The .comment style block draws the comment, but with a box around it. The label style forces the form widgets to appear below the labels. CSS without some context is nonsense, though. We'll get back to this later.

Improvements to the App's Model and Views

Let's look at the changes, starting with the model. We removed the title field and replaced it with a snippet from the start of the text.

class Comment(models.Model):
    author = models.ForeignKey(auth.models.User)
    text = models.TextField(max_length=1024)

    def get_absolute_url(self):
        return reverse('comment_detail', kwargs={'pk': str(})

    def get_first_chars(self):
        return "%s..." % (self.text[0:30],)

    title = property(get_first_chars)

After making this change, you need to do a "./ makemigrations" and a "./ migrate" to sync the db.

Next, we modified I put a form in there, even though good practice says to use I'll be bad today.

I added

from django import forms

The added a CommentForm, and altered the CommentList view:

class CommentForm(forms.Form):
    author = forms.ModelChoiceField(
    text = forms.CharField(

class CommentList(ListView):
    model = Comment

    # adding a form to a listview
    def get_context_data(self, **kwargs):
        form = CommentForm
        context = super(CommentList, self).get_context_data(**kwargs)
        context['form'] = form
        return context

Then, below this, removed 'title' from the fields lists.

The big change is that we now override get_context_data, and insert our form object, named 'form', into the context.

Which Way Do We Alter the Context?

I've seen two ways to alter the context in get_context_data(). I'm not sure which is more correct, but the one I decided to use conformed to the Django docs:

# This creates a new context, adds our object, and then passes it to the
# parent's get_context_data() method. The risk is that our object will 
# be destroyed.
class CommentList(ListView):
    model = Comment

    # adding a form to a listview
    def get_context_data(self, **kwargs):
        form = CommentForm
        context = {
            'form': form
        return super(CommentList, self).get_context_data(**context)

# This gets the context from the parent's get_context_data() method,
# then adds our form to it. This seems better.
class CommentList(ListView):
    model = Comment

    # adding a form to a listview
    def get_context_data(self, **kwargs):
        form = CommentForm
        context = super(CommentList, self).get_context_data(**kwargs)
        context['form'] = form
        return context

New Templates and UX

The template for the comment_list was also altered: comment_list.html is now:

{% extends "base.html" %}

{% block content %}
    {% for comment in object_list %}
    <div class="comment">
        <p>{{ comment.text }}</p>
            <a href="{% url 'comment_edit' %}">edit</a>
            <a href="{% url 'comment_delete' %}">delete</a></p>
    {% endfor %}

    <form method="post" action="{% url "comment_create" %}">
        {% csrf_token %}
        {{ form.as_p }}
        <input type="submit" />
{% endblock %}

There are a number of UI changes, including the use of class="comment" and a DIV to wrap each comment, but the main event is at the bottom. There's a form to enter a new comment.

The first thing to notice is that we can now use {{form.as_p}} in the template, because we added the form to our context.

The second thing to notice is action="{% url "comment_create" %}", which will cause our form to post to a different view, named comment_create.

There's a little twist here - we're using a hand-coded form, CommentForm, to post the data. The comment_create view then handles the POST with a generic view that uses an automatically created ModelForm object for the Comment model.

I did this because I couldn't find an easy way to extract the ModelForm from the Model. Even if I could, I would have wanted to alter the text field so it used a Textarea widget.

comment_detail.html, comment_form.html and comment_confirm_delete.html were also edited, to have more links, and generally create a better UX.

The base.html was also altered, to include the static files.

    {% load static from staticfiles %}
        <link rel="stylesheet" href="{% static "main.css" %}">
        <link rel="stylesheet" href="{% static "comment/comment.css" %}">
        Global base template.
        {% block content %}
        {% endblock %}

The main thing to notice is that we use a new tag {% static %} that is similar to {% url %}. It invokes the static-file finding code to seek out and return the files that match.

The main.css file is a "CSS reset" file. I use only a couple resets:

* { box-sizing: border-box; }
body {
    font-family: sans-serif;
    margin: 0;
    padding: 0;

The first line changes the box model for everything so that box dimensions work like IE. It's nonstandard, but it's easier than the real CSS standards.

The other lines make it look a little more "web 1.5", a la Craigslist.


Some code problems will be fixed and uploaded later this week.

The most glaring problem is that this form lacks a user login - you still choose the username from the dropdown. That's obviously wrong. We need to add a login.

Another missing feature is that there's no photo upload. Gawker and other sites copied this feature from anime boards and other fansites that allow photos in comments.

Future tutorials will add these features.


Random Notes


I heard on a podcast that the term "static" is wrong, and it should be "assets". Every trade has its own jargon.

django-comment-form.png7.24 KB
minimal2.tgz12.56 KB

Django 1.8 Tutorial - 3. Adding Account Logins

The previous article cleaned up the UI and made the comment system work more like a comment system, but it has a glaring flaw: you could choose to post as any user. LOLz.

This small modification adds login and logout features. It does it the raw way rather than use the built-in classes, or the django-registration-redux library. This is just a temporary feature, an example to learn authentication.

That said, it does something a little different from what seems to be provided by Django: the login is embedded right in the page, where the form would have been.

You start off by adding two urls, one to log in, and one to log out. Add this to the urlpatterns in


Then you create the views and a login form. We also need to remove the user selector from the comment form. (Now the comment for has just one field.)

class CommentForm(forms.Form):
    text = forms.CharField(

class LoginForm(forms.Form):
    username = forms.CharField(label="User")
    password = forms.CharField(widget=forms.PasswordInput, label="Password")

Since we have the comment form on the CommentList view, we need to add the login form there, as well. It'll be switched in the template's logic.

class CommentList(ListView):
    model = Comment

    # adding a form to a listview
    def get_context_data(self, **kwargs):
        form = CommentForm
        loginform = LoginForm
        context = super(CommentList, self).get_context_data(**kwargs)
        context['form'] = form
        context['loginform'] = loginform
        context['is_authenticated'] = self.request.user.is_authenticated()
        return context

The CreateView needs to be modified because we no longer specify the author in the form. Instead, we need to get the user from the request. This is kind of tricky. Here's the code:

class CommentCreate(CreateView):
    model = Comment
    fields = ['text']

    def form_valid(self, form):
        comment = = User.objects.get(
        return http.HttpResponseRedirect(reverse('comment_list'))

form_valid(self, form) is called after the form is validated. note that the form is a ModelForm based on Comment, but includes only the 'text' field. So, when the input contains only text, with no author, it's considered valid.

The form validates, and form_valid is called.

The first line,, saves the form, creating a Comment object, but doesn't persist the object to storage. This gives us time to alter the object before it's written to the database.

We alter it by setting the author property. We can't just set it with the ID (the PK) of the user. We need a User object, so we do a lookup based on the PK.

Then, we save() again, this time, allowing it to save to the database.

Lastly, we redirect back to the comment list.

And, finally, the login and logout views.

class LoginView(View):
    def post(self, request, *args, **kwargs):
        user = auth.authenticate(
        if user is not None:
            if user.is_active:
                auth.login(request, user)
                messages.error(request, 'Account not available.')
                'Password incorrect or account not available.')
        return http.HttpResponseRedirect(reverse('comment_list'))

    def get(request, *args, **kwargs):
        # we should never get to this codepath
        return http.HttpResponseRedirect(reverse('comment_list'))

class LogoutView(View):
    def get(self, request, *args, **kwargs):
        return http.HttpResponseRedirect(reverse('comment_list'))

The logout is self-explanatory, but login is not. It's based on the boilerplate from the Django docs.

Authentication and login is a two-step process. You can authenticate a user who is not active - that is, the user account exists, but they haven't verified, or a moderator hasn't verified the user.

The template for the list must also be revised. I'll paste then entire comment_list.html file, since it's changed a bit:

{% extends "base.html" %}

{% block content %}
    {% for comment in object_list %}
    <div class="comment">
        <p>{{ comment.text }}</p>
            <a href="{% url 'comment_edit' %}">edit</a>
            <a href="{% url 'comment_delete' %}">delete</a></p>
    {% endfor %}

    {% if messages %}
    <ul class="messages">
        {% for message in messages %}
        <li{% if message.tags %} class="{{ message.tags }}"{% endif %}>{{ message }}</li>
        {% endfor %}
    {% endif %}

    {% if is_authenticated %}
        User: {{user}} | <a href="{% url "logout" %}">logout</a>
        <form method="post" action="{% url "comment_create" %}">
            {% csrf_token %}
            {{ form.as_p }}
            <input type="submit" />
    {% else %}
        <form method="post" action="{% url "login" %}">
            {% csrf_token %}
            {{ loginform.as_p }}
            <input type="submit" />
    {% endif %}
{% endblock %}

All the changes are at the bottom. There's a test for is_authenticated that switches between the login form and comment form. There's an added logout link.

Above this is boilerplate that displays error messages.


This application is better, but still incomplete. Also, we're doing our own login form, when we should be using Django's provided authentication views.

The built-in auth, plus the add-on django-registration-redux, are really confusing pieces of software, but I'm figuring it's a good investment to learn it, and use it, and extend it, because right off the bat, it saves you from writing several screens of user interaction.

minimal3.tgz15.97 KB

Django 1.8 Tutorial - 4. Integrating the Default Login Screens, Adding HTML Email

So, I started implementing the Django provided user login screens yesterday and it was requiring a ton of reading to get the different parts working. It seems so simple, from the outside, but all the configuration options made it seem more difficult than it really is.

In the attached file, a few of the old configs and views have been deleted, but I'm not going to cover that here. Just do diffs between the contents of the attached tgz files to see the differences.

Create a file in your global templates, templates/registration/login.html:

{% extends "base.html" %}

{% block content %}

{% if form.errors %}
<p>Your username and password didn't match. Please try again.</p>
{% endif %}

<form method="post" action="{% url 'django.contrib.auth.views.login' %}">
{% csrf_token %}
    <td>{{ form.username.label_tag }}</td>
    <td>{{ form.username }}</td>
    <td>{{ form.password.label_tag }}</td>
    <td>{{ form.password }}</td>

<input type="submit" value="login" />
<input type="hidden" name="next" value="{{ next }}" />

<a href="{% url 'password_reset' %}">forgot password?</a>

{% endblock %}

In, set this value to the page they go to after they log in. Here's mine.


There are ways to override this, so you can redirect to the correct page in the event they started logging in from a different page.

Then, you need to copy all the files in the admin templates included with the django package. If you're using a virtualenv, copy them from the directory that looks like the path below.


Edit each file to work with your site. The general pattern I used was to change the 'extend' statement to extend "base.html", and removed blocks that didn't exist in my template.

This is going to take a while. Once you're done with this tutorial, you should have ten separate templates in registration. So, set aside at least a day to implement just the HTML for login and password reset and changing.

While you're editing this, you may want to know what the URLs are. Here's a link to the sources.

Email Configuration
The default email sever is the localhost, and you're probably not running a mail server, so you need to configure email. My configuration looks like below - SSL, and no authentication:

# commented settings below

This server happens to also host my email, so the round trip is quick.

Sending HTML Emails

The default password reset form uses EmailMultiAlternatives, which allows for multipart emails with HTML and text alternatives, and the password_reset view supports a text email template and an HTML email template, but it's not enabled by default. The HTML email template is a little tricky to enable.

First, you need to pass the a parameter named 'html_email_template_name' to the password_reset view. If you're using the django.contrib.auth.urls instead of manually setting up the URLs, you need to override one url, accounts/password_reset/.

To do this, you need to define a url regex for that before the inclusion of the canned urls. The urlpatterns in should look like this:

from django.conf.urls import include, url
from django.contrib import admin
import comment.urls
from django.contrib.auth.views import password_reset

urlpatterns = [
    url(r'^c/', include(comment.urls)),
    url(r'^admin/', include(,
    url(r'^accounts/', include('django.contrib.auth.urls')),

The router reads the urlpatterns from first to last, dispatching to the first match. So it'll match accounts/password_reset, and call it with the additional named parameter.

Create the file registration/password_reset_email_html.html. This is an HTML version of the default template:

{% load i18n %}{% autoescape off %}
<table cellpadding="5">
            <font family="Arial">
{% blocktrans %}You're receiving this email because you requested a password reset for your user account at {{ site_name }}.{% endblocktrans %}
<br />
<br />
{% trans "Please go to the following page and choose a new password:" %}
<br />
{% block reset_link %}
<a href="{{ protocol }}://{{ domain }}{% url 'password_reset_confirm' uidb64=uid token=token %}">{{ protocol }}://{{ domain }}{% url 'password_reset_confirm' uidb64=uid token=token %}</a>
{% endblock %}
<br />
<br />
{% trans "Your username, in case you've forgotten:" %} {{ user.get_username }}
<br />
<br />
{% trans "Thanks for using our site!" %}
<br />
<br />
{% blocktrans %}The {{ site_name }} team{% endblocktrans %}
<br />
{% endautoescape %}

The layout being used is a "table layout", which was common in the late 90s and early 2000s, before better support for CSS in email. CSS in email is still a problem, and coders still use tables. You can look it up.

When you test the format, you can usually switch between HTML and plain text email via a menu item.

Class Based Views? Nope

If you look at the sources, you'll notice that this part of the application is largely lifted from the admin, and it's written in the old style, with function-based-views.

Consequently, it is a little easier to override the default behavior - you copy the code and alter it. The problem, of course, is that this is a brittle technique. Class based views are harder to learn, but easier to modify. Theoretically, the code is more reusable, but I'm not entirely convinced of that. Given these trade-offs, if I were going to reimplement one of the auth views, I'd probably use the View classes.


Right now, we're at 595 lines of code across 27 files. So, it's not a lot of code, but it's a lot of files. In UX terms, it's around 12 or 13 screens or panes on a storyboard. So, it's worth using Django's default authentication pages.

minimal4.tgz16.04 KB

Django 1.8 Tutorial - 5. Django Registration Redux

This text is a work in progress. I'm not even done with this part myself.

In doing some digging, I found out that the leading registration app, django-registration, was abandoned. Some time later, django-registration-redux picked up the ball and has maintained it. There's also another alternative django-allauth, which does registration and integrates with social sites.

The instructions for django-registration-redux are pretty good. Just read through them, and consider this tutorial just a slight gloss of what's covered, plus some specifics about our demo app.

First, add 'registration' to the apps, and define ACCOUNT_ACTIVATION_DAYS and REGISTRATION_AUTO_LOGIN.

Registration includes some urlpatterns from the django auth. You should put the registration urls before the django auth urls.

    url(r'^accounts/', include('registration.backends.default.urls')),
    url(r'^accounts/', include('django.contrib.auth.urls')),

When you order them like this, the registration urls will be handled first, then the default auth will capture the old URLs.

Though the old urls and the new ones can co-exist, it'll just confuse people when there are multiple login and logouts. It also confuses developers, so you should fix things to prefer the new urls.

The names of the old urls in registration are different - they start with "auth_", so "login" is now "auth_login". "logout" is now "auth_logout", and so forth. Here's the list:

^accounts/ ^activate/complete/$ [name='registration_activation_complete']
^accounts/ ^activate/(?P<activation_key>\w+)/$ [name='registration_activate']
^accounts/ ^register/complete/$ [name='registration_complete']
^accounts/ ^register/closed/$ [name='registration_disallowed']
^accounts/ ^register/$ [name='registration_register']
^accounts/ ^login/$ [name='auth_login']
^accounts/ ^logout/$ [name='auth_logout']
^accounts/ ^password/change/$ [name='auth_password_change']
^accounts/ ^password/change/done/$ [name='auth_password_change_done']
^accounts/ ^password/reset/$ [name='auth_password_reset']
^accounts/ ^password/reset/complete/$ [name='auth_password_reset_complete']
^accounts/ ^password/reset/done/$ [name='auth_password_reset_done']
^accounts/ ^password/reset/confirm/(?P<uidb64>[0-9A-Za-z_\-]+)/(?P<token>.+)/$ [name='auth_password_reset_confirm'] 

You'll need to update your templates to reflect this change.

Just look for login, logout, password_reset, password_change, and fix them. (If you're on Unix, you can use grep.)

I had to change comment_list.html, login.html, logged_out.html, password_reset_email.html, password_reset_email_html.html.

You must also alter the URL config in the file that enables the HTML email for password resets, so it overrides the new password changing URL and has the name "auth_password_reset".


At this point, the system should function, except there's no registration. You need to make the following templates. See the docs for details about what variables are defined in the context.


These are all in the tgz file.

If you'd rather start from scratch, here's a command to create the files.

cd templates; cd registration; touch registration_form.html registration_complete.html activate.html activation_complete.html activation_email_subject.txt activation_email.txt activation_email.html

You can also find templates on Github and in some packages.

A small modification
A view with a few links was added to the root of the site. This is to make it easier to test the registration features.

I'm going to do Allauth soon, but not yet. It's a bit more involved than this.

minimal5.tgz16.74 KB

Django 1.8 Tutorial - 5.1 Alternative Authentication (Make Your Own)

I was hoping to get into Allauth, or maybe that and some more UX tweaks, but ended up hitting some walls. We're trying to implement an authenticator that works with an external service, but it's more complex than expected. (I won't be posting it here.)

To get to the point where I could even write parts of the authenticator, I needed to teach myself how to write authenticators. So, here are two authenticators. Both are a little odd, but I think the code is short enough that it won't be confusing.

If it gets confusing because I'm using a couple libraries - study the libraries. If you're not already familiar with subprocess and requests, plan to spend a few more hours to learn those great libraries. It's worth the effort.

The two authenticators are auth_command and auth_proxy_http. Auth_command executes an external command, and allows the login if it passes. Auth_proxy_http makes a request to another URL; the URL must be set up to accept an HTTP-Authorization username and password. If it returns a status code of 200 to 299, it's considered authorized.

To make auth_command, you use

./ startapp auth_command

Then, in that directory, create this command in the file "command":

#! /usr/bin/python
  Usage: command username password
  Returns 0 on success, 1 on failure.
import sys

passwd = {
    'paul': 'password',

    username = sys.argv[1]
    password = sys.argv[2]

    if passwd[username] == password:
except Exception as e:

This is a simple unix command that checks the passwd dictionary to look up a password.

Then, make a file,

from django.contrib.auth.models import User
import subprocess, inspect, os

class AuthCommandBackend(object):
    def authenticate(self, username=None, password=None):
            command = os.path.dirname(inspect.getfile(
            command = '%s/command' % (command,)
            # print 'running command %s' % (command,)
            # run the command
            exitcode =
                [command, username, password])
            if exitcode == 1:
                return None
            return None

        user, created = User.objects.get_or_create(username=username)
        if (created):
        return user

    def get_user(self, user_id):
            return User.objects.get(pk=user_id)
        except User.DoesNotExist:
            return None

It checks that the command accepts the username and password. If it doesn't it returns None. Otherwise, it proceeds to try and get a Django user with the same username, creating it if it doesn't exist.

It also sets the password. This means that future logins are performed against the User, not this other command. If you want the command to always verify the password, don't set a password on the created model. Then, the login for the User will always fail, and fall back to using the command. (Think about this a bit before deciding what to do.)

To use this authentication app, you add the auth module to the AUTHENTICATION_BACKENDS in the settings:

    'django.contrib.auth.backends.ModelBackend', # default

Once this is done, you can try to log into the comment section with the username "paul" and the password "password".

Auth_proxy_http is a similar authentication, but uses the network.

It uses the Requests library, so install it with:

pip install requests

Do a

./ startapp auth_proxy_http

then create

from django.contrib.auth.models import User
import requests

class AuthProxyHttpBackend(object):
    url = 'http://localhost/protected/'

    def authenticate(self, username=None, password=None):
            r = requests.get(self.url, auth=(username, password))
            if r.status_code < 200 or r.status_code >= 300:
                return None
            return None

        user, created = User.objects.get_or_create(username=username)
        if (created):
        return user

    def get_user(self, user_id):
            return User.objects.get(pk=user_id)
        except User.DoesNotExist:
            return None

Set the URL to whatever you want. We should really check a global for a URL value, and then add some notes about overriding this class to specify a different URL.

The code is similar to the command authenticator, except it checks http://localhost/protected/, which, presumably, is protected with a password in the HTTP-Auth style. This is the classic .htaccess file technique.

You would create a file in the website's /protected/ directory called .htaccess, with this content:

AuthType Basic
AuthName "restricted area"
AuthUserFile /home/johnk/Sites/riceball/htpasswd
require valid-user

Then, you create the file specified in AuthUserFile, like this:

htpasswd -c /home/johnk/Sites/riceball/htpasswd test

Then enter the password.

Then, add to your settings:

    'django.contrib.auth.backends.ModelBackend', # default

Now, logins to your Django app will be checked against that URL. If you add someone to the htpasswd, they'll also gain access to the Django site.

Django 1.8 Tutorial - 5.2 Adding a User Profile

In the previous section, I noted that I needed to learn how to make my own authenticator. I also needed to learn to add fields to the User. There are two ways to do it, and the Django docs don't clearly state which is better, for what situations.

I think the old way of adding fields, called User Profiles, is the way to go. It's more code, but I think the code ends up being a little more explicit and easier to read. It also allows you to make multiple different profiles for different things.

The other way, of subclassing AbstractUserModel to create a custom user model, and then specifying that as the user model, is more properly OOP, but is less portable and less reusable.

My example below has two dependencies: the regular auth system and the django-registration-redux library.

A user profile module is simple. There are only four files:,,, and

New fields are added to

from django.db import models
from django.conf import settings
from django.db.models.signals import post_save
from django.contrib.auth.models import User
from registration.signals import user_registered

class UserProfile(models.Model):
    user = models.OneToOneField(settings.AUTH_USER_MODEL, primary_key=True)
    homepage = models.URLField()

def assure_user_profile_exists(pk):
    Creates a user profile if a User exists, but the
    profile does not exist.  Use this in views or other
    places where you don't have the user object but have the pk.
    user = User.objects.get(pk=pk)
        # fails if it doesn't exist
        userprofile = user.userprofile
    except UserProfile.DoesNotExist, e:
        userprofile = UserProfile(user=user)

def create_user_profile(**kwargs):


This code adds a homepage field. Remember those?

The rest of the code is used to create UserProfiles when User objects are created, or to create them for existing users.

Read up on Signals, to understand what this means. It causes the function create_user_profile to execute whenever the user_registered signal is raised.


The latter situation is handled in

from django.views.generic import DetailView
from django.views.generic import CreateView, UpdateView
from django.contrib.auth.models import User
from .models import UserProfile, assure_user_profile_exists

class UserProfileDetail(DetailView):
    model = UserProfile

class UserProfileUpdate(UpdateView):
    model = UserProfile
    fields = ('homepage',)

    def get(self, request, *args, **kwargs):
        return (super(UserProfileUpdate, self).
                get(self, request, *args, **kwargs))

Just like before, we use the generic model views, but there's one twist here. When we attempt to update the profile, we do a "get", and that calls assure_user_profile_exists, to force the creation of this UserProfile.

Then it just returns the regular results.

There is one thing that's missing, which is the deletion view. That's because deletions should happen on the User. We should also link up a deletion function with the signals that are raised when users are deleted - but I haven't written that yet.

There are only two urls in

from django.conf.urls import url
import views

urlpatterns = [

These work with two templates. These are both in user_profile/templates/user_profile/:


{% extends "base.html" %}

{% block content %}
<h2>{{ object.user.username }} Profile</h2>
<p>Homepage: {{ object.homepage }}</p>
{% endblock %}


{% extends "base.html" %}

{% block content %}
    <h2>Edit {{ object.user.username }} Profile</h2>
    <form method="post">
        {% csrf_token %}
        {{ form.as_p }}
        <input type="submit" />
{% endblock %}

There's also an to add the fields to the admin:

from django.contrib import admin
from django.contrib.auth.admin import UserAdmin
from django.contrib.auth.models import User
from user_profile.models import UserProfile

class UserProfileInline(admin.StackedInline):
    model = UserProfile
    can_delete = False

class UserAdmin(UserAdmin):
    inlines = (UserProfileInline,), UserAdmin)

This is a lot of code to monkey-patch the admin so it uses our version of UserAdmin to edit the users.

So, no code for this example. It'll be in the next part, when I try to integrate some of these things into the minimal app, which is less and less minimal by the day.

Django migrate MySQL error 1005 105, can't create table

When you have Django's migrations making foreign keys, you might hit this error, number 1005 or 105.

This may be happening because foreign key constraints can be applied only to identical columns that are unique.

So, check that they're unique, and add an index. (If you try to add the index, it'll fail if the values are not unique.)


Then, if you still get the error, check that the character sets are the same on both tables. (I don't think Django's db reflection keeps track of that.)


Also, if the charsets are different, check the default charset for the table. Make sure it's "utf8".


Email Obfuscation and Shielding Script

Here's a perl script that takes email addresses as arguments, and returns javascript code that hides your email address from web spiders. The email address is also linked so it's clickable.

#! /usr/bin/perl

foreach my $email (@ARGV) {

        $email =~ s/@/ @ /;
        $email =~ s/\./ . /;

        @parts = split( ' ', $email );

        print "<script type='text/javascript'>\n";
        print "document.write('<a href=\"mailto:');\n";
        foreach my $word (@parts) {
                print "document.write('".$word."');\n";
        print "document.write('\">');";
        foreach my $word (@parts) {
                print "document.write('".$word."');\n";
        print "document.write('</a>');\n";
        print "</script>\n\n";

Erasing Hard Drive Data

The "gold standard" in this category is "Darik's Boot and Nuke" or DBAN (pronounced D Ban).

DBAN is a tiny version of Linux, usually run from a diskette, that contains a program that will erase any hard disk on your computer. It has several different methods, many which are used by the military to securely erase data.

The reason for such a tool is that, even if you erase the data, and write new data, the old data can still be extracted by skilled technicians. DBAN repeatedly writes to the disk, with different patterns of data, to make it more difficult to find the old data on the disk.

FFMpeg to Encode Flash Videos

Here is a command line to encode to low-resolution Flash video for publishing on a web server (that isn't a streaming server).

cat FASTING/VIDEO_TS/VTS_01_*.VOB | /usr/local/bin/ffmpeg -i - -f flv -s 320x240 -acodec libmp3lame -ar 22050 -ab 64k fasting.flv

What this does is dump all the .VOB files into ffmpeg, which is told to take input from standard input.

Firefox Stopped Working

Firefox stops working. No error messages. No crashes. It just stops. Also, reinstalling doesn't fix it.

Here's how I got out of that situation:
* Uninstall Firefox.
* Go into C:\Program Files\ and move the Mozilla Firefox folder to the desktop.
* Restart the computer
* Move the Mozilla Firefox folder to the trash, and empty the trash.

At this point, most folks can reinstall, but if you used FrontMotion Firefox (a Firefox installer that comes as an MSI file), you will have to go into the registry, search for FrontMotion, and then delete the CLSID for that version of Firefox). More details below.

* Reinstall.
* Firefox should start up fine.

More info at MozillaZine.

Using Regedit to Alter the Registry

Uninstalling the old FrontMotion Firefox seemed to leave behind a registry key that may be impeding a new install of Firefox. So you have to remove that object.

Press F3
Search for "FrontMotion" or whatever you think might work.
You should look for the string that's inside a CLSID object.
Delete that object.

Also published at eHow.

Flash on 64-bit Ubuntu Linux (Yet another Howto)

[This is obsolete.]

Yes, yet another short tutorial.


sudo apt-get install -y ia32-libs lib32asound2 lib32ncurses5 ia32-libs-sdl ia32-libs-gtk gsfonts gsfonts-x11 linux32

2. Download Firefox from Unpack it.

3. Download the Flash installer, version 9. Unpack it.

4. Go into the Flash installer directory, and

linux32 ./flash-installer

(or whatever it is called).

5. Go into the Firefox directory, and

linux32 ./firefox

My install doesn't have the nicer GTK2 widgets. It reverted to the old style widgets, which look kind of "Win95". Still, it's worth it to get my occasional viewings of various online video sites.

What's Going On?

The Flash plugin works only in a 32 bit environment. The linux32 command fakes the 32bit settings and runs the firefox command, and the various ia32-* libraries create a 32bit environment.

To simulate a 32bit environment, you need to tweak what the system reports as its hardware... and also install a bunch of libraries that link with 32 bit code. Pretty simple.

Formatting Email So It Looks the Same on All the Clients and Browsers

I was having lots of problems with HTML email layouts. After doing some research, I came up with a method to get almost pixel-perfect positioning and sizing. It's not that hard.


Most of the tutorials out there about HTML email range from okay to incorrect. Maybe they're just too old, and the current email systems are better. You should skim over those, and read one carefully.

[New info: the best tutorial is HTML Email Development Tips by Tom Elliott.]

The best tutorial is Using CSS in HTML Emails, The Real Story, by Chris Coyier. Also, read this great guide from MailChimp. Also read some of the discussions in the list of resources below to get an idea of the scope of the problem.


Here is a rough "theory" or methodology of how to make crossplatorm email. First, to simplify things, we'll ignore the Mac and iPhone email readers. It's not as bad as it sounds. The basis for rendering email in 2010 is HTML rendering engines. There are two dominant engines: Mozilla's Gecko, and Internet Explorer. The third is Apple's Safari, which is based on KDE's Konqueror browser. Also, Outlook 2007 started using the MS Word rendering engine, messing things up for them. But, generally speaking, standards compliant HTML is the basis for rendering HTML email.

Not only are these HTML rendering engines used in web browsers - they are also used within email reading clients like Outlook (up until Outlook 2007), Outlook Express / Windows Live Mail, Thunderbird, and Apple Mail.

Because these engines are largely standards-compliant, you would think that HTML email would render cross-platform. Unfortunately, this isn't the case. There are three reasons why: Yahoo, Gmail, and font sizes.

Yahoo and Gmail

Both Yahoo and Gmail alter the CSS of incoming email. Yahoo alters the basic CSS, and also strips out any CSS you use in the HEAD. Gmail strips out the CSS, and also alters the basic CSS, but differently.

Yahoo's most offensive alteration to CSS is the removal of vertical margins between paragraphs. Normally, people using the P tag expect for some spacing to be added after the paragraph close. Yahoo gets rid of that. So, to produce a paragraph, you need to add a line break. The problem with that - is that other email readers will see extra white space.

Gmail is less offensive, but they still do some annoying things. Their first offense is to strip out the CSS in the HEAD, so all your document-wide font styles vanish. Additionally, they re-define the H tags so the sizes conform to Chrome's sizes, more or less.

The other services may do things to CSS as well, but not so drastically.

Font Sizes

Gmail's CSS mangling has a reasonable intent: to fix the font sizing irregularities between Mozilla Gecko and Internet Explorer.

The font size problems are legendary, but, the gist of it is that Gecko's idea of an "xx-large" font is a lot smaller than IE's idea of an "xx-large" font, and this pattern continues for all font sizes. This also maps to their interpretation of the FONT tag's SIZE attribute.

(There's also the problem of different font sizes and pixel sizes between PCs and Macs.)

Other Issues

In addition to inter-paragraph spacing and font sizes, some services alter the line spacing. Yahoo seems to mess with the cellspacing on tables, as well.


The solution is to use a subset of HTML, with a subset of CSS, and also use tables. This subset is not the one you'd expect: you don't stick with old-fashioned HTML; you follow new rules. Here they are.

Don't use P, use BR
Yahoo messes up the P tag, so you need to use BR instead of P. You may need to initially use P to wrap the entire message, but you can use DIV as well.

Don't use FONT, use CSS font styles
Don't use FONT anywhere. Instead, use SPAN with a STYLE attribute:

<SPAN STYLE="font-family: Verdana; color: #009900; font-weight: bold;">Verdana, bold and green</SPAN>

The crossplatform CSS names are: font-size, font-family, font-weight, and color.

You can still use the B and I tags!

Don't use xx-large, use "font-size:24px;"
Don't use relative font sizes anywhere. They won't display identically on Outlook and Hotmail, nor will they display the same on Mozilla Firefox and Internet Explorer. Instead, use absolute pixel sizes, and use them only within those STYLE attributes.

Don't use the H1-H6 tags either. Those rely on relative sizings.

(The Mac or the new iPhone may display the text differently.)

Use the IMAGE tag freely.
Images work. In fact, the HSPACE and VSPACE attributes are honored, crossplatform, so they're a good way to get around issues of horizontal and vertical margin spacing, if used creatively.

You can use the UL and OL lists
Yes you can!

Edit with NVU
So far, this has been the editor that allows you to follow the above rules without too much pain. It uses BR instead of P, and allows you to edit inline styles with a couple clicks.

The other option is to edit by hand, which isn't so bad: the limited set of tags makes things "simple."

(The real solution is to alter the HTML output of a web-based editor like TinyMCE, and then paste the altered code into the email software.)

Using the above rules, you can produce nearly pixel-perfect crossplatform email. The line breaks are the same across browsers and email systems (on Windows).

Items under research.
Line spacing is inconsistent. As a quick fix, you can use spacer images.
Table spacing is inconsistent. The Coyier article covers tables.
DIV spacing still needs some research.


A good tutorial from Tim Slaving, with lots of technical details.

List of resources and discussion at Email Marketing Reports.

CSS feature matrix at Campaign Monitor.

How to code email newsletters, a Sitepoint tutorial.

Also check out Premailer, a web app that alters your HTML so it works in email. Unfortunately, they did things in ways that won't work with Yahoo.

FreeDOS Boot USB Flash Memory Stick

If you don't have Windows around anymore, and you need to flash your BIOS, you need to figure out a way to make a bootable floppy.

If you don't have a floppy, you can make a bootable CD.

If you lack a CD burner (like I do), you can use a USB memory device.

p-code has a great way to do it - no more downloading win32 utilities, or trying to make the disk bootable from within unix. Just run an emulator, and pretend the USB disk is a hard drive.

This is such a clean way to do it - you're relying on the correct low-level emulation of the hardware, and using trusted DOS-based utilities to construct the boot disk.

GIMP HTML Trick: Paste Color Names

If you paste the HTML color name of a color, like "palegoldenrod", into the color picker, the color will be set to that color.

Gentoo Linux on a Medion MIM 2040 (MD 42100)

I received a somewhat old, but very nice Medion MIM 2040 laptop from J--- who I sometimes work with. Thanks man. These are some notes about getting Gentoo Linux running on it. Ubuntu came installed, and worked fine, but I had the urge to make it run faster, thus, Gentoo.


The computer came with a nonfunctioning keyboard. The suspicion is that it was used on a pillow, so the fan was blocked, and overheated the keyboard. Perhaps the plastic melted. The computer gets hot because the airflow isn't very good. It's important to rest it on a plank of wood or some other hard surface. (Update - the keyboard started working. Not sure why.)

It's a MiNote 8089

This computer appears to be a MiNote 8089, or a Mitac 8089. Batteries and power adapters are available.


00:00.0 Host bridge: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
00:00.1 System peripheral: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
00:00.3 System peripheral: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 83)
00:1f.0 ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 03)
00:1f.6 Modem: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 03)
01:01.0 Network controller: Intel Corporation PRO/Wireless 2200BG Network Connection (rev 05)
01:02.0 Ethernet controller: VIA Technologies, Inc. VT6105 [Rhine-III] (rev 8b)
01:03.0 CardBus bridge: Texas Instruments PCI1410 PC card Cardbus Controller (rev 02)


The Intel 2200BG isn't supported in the Gentoo driver, so you need to install the sources and build a new kernel. You also need to "emerge net-wireless/ipw2200-firmware" to install a firmware compatible with the driver. After this, the system runs fine.

Install wpa_supplicant to do WPA encryption.


Seems to be a Mitac 8089 under the label. Mitac keyboards are available, but not cheap.


Uses the "i810" driver for Intel chipsets. Set this in your xorg.conf file.

Hard Drive

Is accessible under a removable panel.


Compile the kernel with the "intel8x0" drivers. There's one for the sound card, and another to make the sound card act like a modem. In the kernel's menu-based config ("make menuconfig"), it's under Sound -> ALSA -> PCI Devices, and it's called "Intel/SiS/nVidia/AMD/ALi AC97 Controller" and "Intel/SiS/nVidia/AMD MC97 Modem". Enable them as modules.

"emerge alsa-tools alsa-utils alsa-oss libogg lame"

Run "alsaconfig" to set up the sound. After a reboot, it should work, but the sound is quiet. You need to use headphones. I'm still having problems getting KDE to use sound, but /dev/sound/audio will play .au files.

I'm not sure if lame, toolame, or twolame is best to use.


See the sound section.


Default gentoo driver set picks up the Rhine III chipset.


Not set up yet.


I ordered one on Ebay for around $60. It arrived from China around 4 weeks later, and it functions fine. The computer only runs around 3-4 hours with the battery, but this is probably due to Linux not being good about powering down things.


To have a firewall running, enable the network netfilter in the kernel, then:

emerge netfilter iptables

Gentoo: My Configuration

The notebook computer says:

free ~ # smartctl --health /dev/hda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge...

SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
  5 Reallocated_Sector_Ct   0x0033   001   001   050    Pre-fail  Always   FAILING_NOW 1023

Yikes! That's the first time I've seen FAILING_NOW. No wonder I get error messages.

The good news is that virtually all my data is on servers. Email is IMAP, bookmarks are in Foxmarks, music and other stuff is on a desktop machine, code is in SVN. The only thing not saved is my system configuration, which is a time-consuming task. So, here goes:

free ~ # cat /var/lib/portage/world

free ~ # cat /etc/make.conf
# These settings were set by the catalyst build script that automatically built this stage
# Please consult /etc/make.conf.example for a more detailed example
CFLAGS="-O2 -march=i686 -pipe"


USE="X x11 kde qt4 qt3support qt3 dbus hal mysql mysqli ctype pcre session unicode cgi jpeg png alsa acpi firefox gtk lame ogg mp3 mpeg nas ncurses"
# note, cgi is for php :(

free ~ # cat /etc/lighttpd/phpmyadmin.conf
alias.url += (
        "/phpmyadmin/" => "/usr/share/webapps/phpmyadmin/"

free default # pwd ; ls -l
total 0
lrwxrwxrwx 1 root root 20 Nov 25 15:50 lighttpd -> /etc/init.d/lighttpd
lrwxrwxrwx 1 root root 17 Nov 23 12:32 local -> /etc/init.d/local
lrwxrwxrwx 1 root root 17 Nov 25 15:48 mysql -> /etc/init.d/mysql
lrwxrwxrwx 1 root root 20 Nov 23 14:06 net.eth0 -> /etc/init.d/net.eth0
lrwxrwxrwx 1 root root 20 Nov 23 12:32 netmount -> /etc/init.d/netmount
lrwxrwxrwx 1 root root 15 Dec  7 02:51 nfs -> /etc/init.d/nfs
lrwxrwxrwx 1 root root 19 Nov 25 15:49 postfix -> /etc/init.d/postfix
lrwxrwxrwx 1 root root 17 Nov 23 13:59 sshd -> ../../init.d/sshd
lrwxrwxrwx 1 root root 21 Nov 23 14:10 syslog-ng -> /etc/init.d/syslog-ng
lrwxrwxrwx 1 root root 22 Nov 23 14:11 vixie-cron -> /etc/init.d/vixie-cron
lrwxrwxrwx 1 root root 15 Nov 24 13:49 xdm -> /etc/init.d/xdm

That's the basic setup. lighttpd needs to be tweaked, and the phpmyadmin config file needs to be created.

The kernel config file, /usr/src/linux/.config, is attached. It's a work in progress, as usual.

config48.04 KB

Getting Blackberry Desktop to Work with MS Outlook

For some reason, the Blackberry Desktop software wasn't showing Microsoft Outlook as one of the PIMs to synchronize with. I tried versions 4.3 4.5 and 4.6, with identical results. Intellisync's configuration wizard would appear, and you'd see the Contacts, Calendar and other options. When you click Setup..., a dialog box would open and you'd see only Yahoo mail and Text file exporters.

The solution was to install the software using the "Work Email Address" setting, which is designed to be used with Blackberry Enterprise Services. Do this even if you don't have BES. During the setup, select BES for MS Exchange.

Make sure that MS Outlook is set as your default mail and address book provider. This is in the Control Panel's Internet Options, under the Programs tab.

For some reason, this will cause everything to work. Everything should default to syncing with MS Outlook.

It will also cause Blackbery Desktop Redirector to also start up. This may or may not cause problems. It's hard to tell.

Also, for some other odd reason, this solution isn't presented on the forums, but was found on a Zimbra website.

HTML CSS 3-column Layout with Content Above the Navigation

I was toying with some SEO ideas, and wanted a CSS-based layout that puts the content at the top.

After doing so much PITA CSS for a year ror two, and then not doing it for a couple years, it suddenly got really easy to make this layout. Maybe the CSS concepts just take time to sink in. It seemed to make more sense as I forgot the language.

Attached is a 3 column layout with two nav bars, a sidebar column, a footer, and a header. The content is right at the top, and all the navigation is between the content and the footer. The layout is fixed, not liquid, because liquid and wide-screen don't mix.

The code is a skeleton, not a functioning layout with all the elements in place. There's no CSS to turn lists into links, for example.

CSS Tricks has more information and a couple tricks to make this kind of layout work.

index.php.txt1.17 KB

Haskell Learning Notes

A couple years ago I tried to learn Haskell and dropped the study. I'm not sure what happened, but it's really hard to find a solid block of time to study it. Haskell syntax is so different from other languages that it's difficult to pick up.

So I'm writing these notes as a kind of alternative study to the (good) tutorial "Haskell for C Programmers" by Eric Etheridge.

Some syntax is simple. Numbers:

[1, 2, 3]

"This is a string"

Tuples, which are kind of like C structs or Pascal records:
( 1, "name", 5.0 )

The difference between tuples and lists is that tuples are always the same length, but can be different types. Lists are any length, but all the same type. They have totally different uses in Haskell.

Haskell is a functional language, meaning that functions are the common way to break a program down into smaller parts. Where Haskell functions differ from function definitions in languages like C or JavaScript is that Haskell functions are descriptive more than procedural.

Starting here, all code is code as it appears in a source code file. These files end in ".hs" by the way. To use the code, you load it into Hugs98, and then you can call the code from the command prompt. The command prompt is indicated with "Main>".

Functions are defined with the = sign:

foo x = [ 1, 2 ]

Now, this is a nonsense function. It's called like this:

Main> foo 4
[ 1, 2 ]

It always returns the same values, [ 1, 2 ].

The function name is "foo". The argument is called "x". The return value is always the list [ 1, 2 ].

Here's an equally foolish function:

foo x = [ 1, 2, x ]

Call it like this:

Main> foo 9
[1, 2, 9]

It substitutes the value of x for the last item in the list.

Here's a more useful function:

square x = x * x

Main> square 9

Yet more useful functions:
tax p = p * 0.0975
tip p = p * 0.20
totalTab p = p + (tax p) + (tip p)


While the popular tutorial starts with the example of a function that generates the list of Fibonacci numbers, I will do somthing simpler: filters on lists.

Here's a function that takes a list as an argument and returns the entire list. This is not a filter :)

notafilter lst = [ x | x <- lst ]

This is a list comprehension -- a statement that describes a list. This list is [ x ], where x is taken from lst (which is the argument to the function). The thing on the left of | is the an expression, and the thing on the right describes the element. In this function, the expression is just x. It could be a more complex expression.

Here's a function that will filter in all the even numbers in a list of numbers:

evensFilter lst = [ x | x <- lst, mod x 2 == 0 ]

The expression after the comma (,) is a conditional. If its value is true, the element is included in the output list.

"mod x 2" is x modulus 2. Even numbers evaluate to 0.

== is the comparison operator.

Here's a function that adds an "s" to each string:

pluralize lst = [ x ++ "s" | x <- lst ]

Main> pluralize [ "cat" , "dog" ]

And another one that turns verbs into nouns. Some verbs that is:

gerundize lst = [ x ++ "ing" | x <- lst ]

Main> gerundize [ "park", "crash", "turn", "run", "smoke" ]

OK, so it's not that clever, but it's not bad for a one-line program.

And, finally, because this tutorial is on the web, here's a little html tag writing code:

blink str = tag "blink" str
tag t s = "<" ++ t ++ ">" ++ s ++ "</" ++ t ++ ">"

And a few more:

p s = tag "p" s
h1 s = tag "h1" s
h2 s = tag "h2" s
strong s = tag "strong" s
em s = tag "em" s
br = "<br />"

Main> p ("foo" ++ em "bar" ++ "baz")

The parens set the order of operations. This example might be useful.

FYI, Haskell syntax notes.

Haskell Notes 2

I found a good tutorial at Wikibooks, Haskell. It's beginner level like these notes, but is way more organized. My notes here are more difficult to comprehend (due to lack of editing), but the examples are simple enough for some people to understand.

One thing I like about that book is that they start out without using type signatures. All the other tutorials use type signatures, even though they aren't required. They're really good form, but can get in the way of learning quickly.

Here's an example that converts a list of strings into a JSON list of strings. (Sort of - I don't know how to insert double quotes.)

jsonList lst = "[" ++ ( jsonListJoin lst ) ++ "]"
jsonListJoin [] = ""
jsonListJoin (x:[]) = "'" ++ x ++ "'"
jsonListJoin (x:xs) = "'" ++ x ++ "'," ++ jsonListJoin xs

This defines three versions of jsonListJoin, and the correct one is dispatched by pattern matching. The first one never gets called normally, but it's in there just in case.

The second version matches the end of the list, where you have one element followed by the null. It's just like the final version except without a comma after this argument, and without the recursive call to jsonListJoin.

The third version is the most general version, and it matches any situation where there's a list with two or more elements. The first item is taken and turned into a JSON string, and the remainder of the list is passed to jsonListJoin. There's a comma in this version.

How to Add a REST API to an Existing Django Project

This is a note I wrote to myself about how to add Django REST Framwork to an existing project. It's in PDF format, for reading. I didn't have the time to create a real tutorial that builds up the API, or produce a really useful API. The intention is just to outline what parts get defined and how they work together.

django rest api - adding to existing project.odt51.95 KB
django rest api - adding to existing project.pdf85.9 KB

How to Keep Your Notebook Running Speedy

This is an addendum to the two articles about keeping Windows XP speedy. This article discusses a few issues relevant mostly to laptop computers.

Check the Disk

Portable computers are more likely to have disk problems than stationary computers. Running a disk scan gives the built-in hard-disk repair features a chance to operate.

Right click on the C: drive icon. Select Tools, and click Check Disk. Check off the option to fix bad sectors. It'll inform you that you cannot check the disk, but when you restart the computer, it can run the disk checking software.

Scan for viruses more often

Laptops often end up connected to different networks. Each connection is an opportunity for infection. Scan after you travel with the computer into a foreign network.

Add Memory

Generally, notebooks start out "behind" in the RAM game, and as the updates accumulate, you hit the "wall" and start using virtual memory. That means tapping the hard disk, which, as noted above, is likely to have errors. Also, the hard drives tend to be a little slower.

To reduce memory usage, review the other articles. Use MSCONFIG to alter what programs are being run at startup.

How to Stay Virus Free with Windows XP, the Bare Minimum

1. Get some kind of anti-virus software. Consumer Reports recommends PC-Cillin, which is cheap and doesn't bog the system down.

2. Start using Mozilla Firefox. It's attacked less often than Internet Explorer.

3. Avoid clicking on attachments. Avoid using MySpace. Avoid Yahoo Instant Messenger.

4. Get a copy of the Ultimate Boot CD for Windows, and learn to use it to clean the system of most viruses. What UBCD doesn't catch, the other antivirus software should catch.

5. Get a firewall/router. The one I like is the Linksys WRT54G, but any kind is fine. A hardware firewall will add some security by being a little harder to hack than a computer with firewall software. (You should still run the firewall software.)

6. Set up an extra user with limited access. Use this as your main account, dropping into the administrator (or computer_owner) account to install software.

How to Stay Virus-Free and Speedy with Windows XP

Every couple of months, someone asks me how to get their computer to go faster. Usually, they're relatively new to computers, and while they get around pretty well on the internet and know how to use their system, they don't always understand how to avoid being attacked by viruses or other "malware", or how to manage their system so it runs fast.

(Thanks to CSH, DKL, ECC, REG, and CEG for putting me to work dealing with these annoying computer issues. Also, thanks to BG of MS for operating the company that created this thing called Windows. Without them, this page wouldn't exist.)

Good Habits

Comfortable computer use is achieved by practicing good habits, and avoiding bad habits. Bad habits lead to pain. Everyone has some bad habits, and everyone will experience some pain, and I am no exception. I've been hit by viruses, had computers "cracked", and have lost data due to negligence. However, I've also managed to recover from most of these situations relatively unscathed.

This is a lengthy list of good habits. It's best to try each one out for a while, individually, and learn to integrate the good habits into regular use.

Good habits are hard to attain (just ask my doctor), so don't criticize yourself too much if you can't do all these things. It's just important that you try.

Three Types of Users: Administrator, Power User, Regular User

When you set up XP, it asked you to create a name and password for the computer owner (that's you). This is the Administrator account. You should not use the administrator account day-to-day. XP also asked you to create a Power User, to use the computer regularly. You should also not use the Power User day to day.

Instead, you should create a third user, who is a regular user. A regular user is restricted from installing new software and hardware on the computer. This includes "plug ins" or "active x controls" on websites. You should use the regular user account as your main account.

Very quickly, you'll notice that web pages, and some emails, ask you to install software. When this happens, you should click on the Start Menu, click "Log Off", click "Switch User", and then log in as the power user. Then, you can go back to the website, and install the software.

Personally, I tend to use the power user account, but novices should use the regular user because it forces you to learn about all the situations when software is trying to execute. (It doesn't happen just anywhere.) After a while, you'll figure out situations when you're likely to be asked to install software, and then make a conscious decision about whether it's worth it or not.

Use "Add or Remove Programs"

This is advice for people who install or "try out" a lot of software. If you don't do that, skip this section.

The Add or Remove Programs tool in the Control Panel should be used once every couple of months to remove any old software you're not using.

Some programs cause the startup and shutdown sequence to launch other programs "in the background". These are programs that don't show up in the task bar, but do show up in the "Task Manager" application, under the "Processes" tab. (To use the Task Manager, right click in the task bar, and it's one of the menu options.)

These "background" programs consume some memory, and use some processor time. They're designed to be sparing with their usage, but, when you have dozens of programs installed, they tend to add up.

Don't Install It

Don't install the customized cursors, Weather Bug, screensavers, or browser toolbars. I know you want to do it, but, some of these things are "spyware" and consume processor resources. They may also "spy" on your web surfing and keystrokes, and send the information to a database.

That database is a big list of "suckers" or "easy marks" -- people who are willing to install software, and spend money online, without much concern for security. These online marketers will turn your personal information into ad campaigns directed at you, to take your money.

If you have installed it, you can try to uninstall it by referring to the previous section.

If that doesn't work, read on.

Reinstall Windows Occasionally

You should save all your data (see Backups below), and erase the hard disk, and re-install Windows every two years or so. This will wipe out all the junk. To do this, you need to do a little planning, and make sure you have all your information in order:

  1. Make sure you have CDs for all your software, including the CD Keys.
  2. If you don't know the CD Key, you can usually go into the Help->About This Program menu item, and find it displayed there.
  1. Make sure to write down your usernames and passwords you've stored on the computer.
  2. They might be inside your browser. If you're using Mozilla, you can go to Tools->Options...->Security tab, and there's a button to view your passwords.
  1. You can back up your various settings by going into My Computer -> C: drive -> Documents and Settings -> your user name.
  2. Then, go into the menu Tools -> Folder Options -> View tab, and select "Show hidden files and folders".
  3. You can then see the Application Settings folder, and copy it to backup media.

If you have enough spare disk space, on a second disk, you should keep all your installers, especially the ones you download from the Internet. Now would be a good time to go and download the latest versions of your favorite software.

Finally, you can re-install Windows, or run the restore CD, and clean out your system.

Keep a Software Library

Get some large envelopes and some magazine storage boxes, and put your CDs and software license certificates in there. It'll take up some space, but, you need that information to reinstall your software (or to sell it).

Use MSConfig to Disable Annoying Startup Junk

Being somewhat inexperienced with Windows, I didn't know about this useful tool. It allows you prevent startup programs from running. To use it, press Window-Key-R or Start->Run.... Type "msconfig", and Enter.

Each tab shows you a little bit of the startup sequence. The most crowded area is the last tab, where apps like iTunes and Real Player install tiny programs that check for updates. They suck up some resources, and when there are enough of them, things can get slow. Flip them off by unchecking them. Of course, you can't just flip everything off, but if something goes wrong with one, you can turn it back on.

Scan for Viruses

I tend to not run any virus detection software. Instead, I just go to McAfee and run a free scan there. This is a way to check that my habits are working. I check around once every three months, but more often on new systems. also has a free scan. If you have a virus, you should probably buy one of the products to disinfect yourself.

The new hot product is Kaspersky's virus scanner. They don't have a free version, but they do have trial versions.

Use Firewalls

Windows XP comes with a firewall, and for starters, you should use that.

To set it up, go to the Control Panel, then Security Center, then scroll down the window to the Windows Firewall icon. Make sure it's ON. Then, look at the Exceptions tab. That lists programs that are set up to listen for incoming internet connections. You should disable some of them a couple of times a year, just to see what happens (or see what fails to work anymore).

If you're on Windows 2000, you should definitely use a software firewall like Zone Alarm, or, my fave so far, Outpost Free. These programs have more features than the regular Windows XP firewall, but basically do the same thing. They also give you a nice overview of what traffic is active on your computer.

If you are using a DSL or cable modem service, you should also get a router. These are devices that are designed to allow more than one computer to connect to the high-speed line. They also include a simple firewall. By using one of these, you add an extra layer of security to your network.

The only negative aspect of using a router/firewall device is that some applications, like some kinds of file transfer over peer-to-peer networks, will fail, or become difficult to set up, because you have to mess with the firewall first. (If you want life to be a little easier, get one that features UPnP, or is called a "gaming router".)

Use Good Passwords

I once worked at a company that used really weak passwords. That was the first place I experienced a computer break-in. It sucked.

A good password has a combination of words, numbers, upper and lowercase letters, and maybe some punctuation. A good password can also be very long, like a complete sentence.

"5TT3err%" is a good password. "ohmanmyfingerhurts" is a good password. "password", "admin", and "ucla" are bad passwords.

There's a password quality evaluator elswhere on this site.

Run Backups Regularly

It's critical to have a good backup system. The computer or hard drive will fail, eventually.

There are two important aspects to doing backups painlessly: organizing your data, and organizing your backups. First, you need to organize your folders, so all your data is in one place. Windows wants you to put everything in "My Documents" I suggest using it. Within My Documents, create a filing system of folders within folders, to organize your documents and/or work. You might make one folder per client, or one per project, or organize files by the type of file. Personally, I tend to keep one folder per client, and put projects within it.

Once your files are in one place, and organized, it's not that hard to plan a backup.

I could go on at length about backup strategies -- entire books have been written about it. The basic, simple strategy is to buy enough extra, external storage for all your files, and run a backup at least a couple times a year. If you have data that changes a lot, back that up every week or so.

There's a lot of software out there to help with this, and that might be discussed on another page.

There's also a built-in backup tool, under Start Menu -> All Programs -> Accessories -> System Tools -> Backup, that performs different types of backups. I don't use it, but it's there if you wish to implement a more rigorous backup system.

Don't Click on Attachments

If you don't know why someone's sent you an attachment, don't open it with the double-click.

Instead, save the attachment to the Desktop, and open it with the appropriate application, or with Notepad.

Also, don't use Outlook Express (or Outlook if you can avoid it). Those are the most attacked programs.

Use Mozilla

Use Mozilla Firefox and Mozilla Thunderbird. They are a bit more secure than Internet Explorer and Outlook Express.

This may change, as they get more popular, but today, the Mozilla programs aren't attacked by the malware writers.

Find Alternatives

A lot of popular software has alternatives. For example, I use a (somewhat hard to install) app called GAIM instead of AIM and Yahoo Messenger. Thus, I can delete both AIM and Yahoo Messenger, which both take up a lot of space, and also slow down the computer more than GAIM.

By using simpler alternatives, which use less CPU and RAM (and are usually free), you can speed up your overall computer speed.*

Here are some alternatives:

Yahoo Messenger, AIM = GAIM or Trillian
Outlook Express = Mozilla Thunderbird or Sylpheed
Windows Media, Quicktime, Real Player = Video LAN Client (sometimes)
iTunes = WinAmp Free (the smallest version)
MS Office = MS Works (which usually comes free with computers, costs $10 on eBay)
Photoshop = The GIMP

* A reason why this speeds up the computer is because you avoid using up all your random access memory (RAM), which is on a chip, and avoid causing the computer to use "virtual memory" (VM), which is on the disk.

Winson's Place: another good article

How to Stay Virus-Free and Speedy with Windows XP, Part 2

Recovering From An Infected System

I didn't realize how lucky I was to have avoided viruses. A system came to me with a virus that prevented users from typing in the access information to AOL's virus system, and seemed to also hide from some virus scanners. The solution is to use a "boot CD" to start up the system from the CD-ROM, and then run tools to clean off the hard disk.

Boot CDs started out on Linux, where it was not entirely unusual to set up machines to boot up (start up) into different operating systems, or even different configurations of the same operating system. The next logical step was to put the entire operating system onto the CD. This idea led to the creation of Windows Boot CDs.

The one I'm using currently is The Ultimate Boot CD for Windows, which is based on Bart's PE, a boot CD system. It comes preinstalled with all the free command-line virus scanners.


After one run through with the boot CD, I did a session using "F8". When you reboot into XP, start hitting the F8 key to get the menu to start Windows in "Safe Mode". Safe mode starts up Windows, but doesn't start up most of the drivers or services, thus preventing viruses from starting.

Boot into safe mode with networking, and then go to the virus scanning sites (listed above). They'll find any stray viruses. You can then remove the files manually. Easier said than done, though... Viruses know how to hide, and anti-virus tool vendors don't want to make it too easy to clean yourself.

The first tool in your arsenal is the "Search..." program from the Start Menu. Type in the filename and see if it comes up. If it does, delete the file.

If it doesn't, the virus is located in some hidden directory. That means you have to use the Command Line, cmd.exe. McAfee displays the first directory, so you can usually CD into that directory. Then, you can do a "DIR /A" to display hidden files. Using a little cut and paste, you can build the correct path for Search.

For example, one virus was detected in C:\System Volume Information\_restore{987E0331-0F01-427C-A58A-7A2E4AABF84D}. I had to dig around to build that path, but once it was in Search, it found the offending file, and it was deleted.

Cleaning Up the Disk

I believe that keeping the disk clean is of dubious value, unless the system is very old. Most slowdowns are due to applications and small programs executing, consuming memory. This causes RAM to run out, and forces the system to swap to disk (that is, it saves out part of RAM to disk, and then loads up data from disk into RAM).

That said, there are some disk tools that, at the very least, look useful. They are located in Start Menu -> All Programs -> Accessories -> System Tools. Disk Cleanup compresses old files, and deletes temporary files. Defragment Disk rearranges the blocks on the disk so that file access will be a little faster. If you're going to use them, run the cleanup first, then defragment.

Before you defragment, you may want to twiddle the virtual memory (VM) settings a little bit. Turn it down to a small size, or use no paging file if you have enough RAM. Then, defrag the disk. Then, boost the VM back to its prior size or larger. This will cause the VM page file (the file where VM is stored) to be a large, contiguous block. VM access will improve.

I've noticed that some people have slow disks, and that can kill performance upgrades. If you get a significant motherboard upgrade, it's a good idea to get a new disk that will run as fast as the built-in IDE controllers on the motherboard. If the system has PCI-X slots, get a 3.0 Mb/s SATA card and a SATA drive. This will improve booting and program loading times. Additionally, get enough RAM so you don't swap to disk. VM isn't supposed to be something you use regularly. It's there for emergencies, when you really need just a little extra space.

How to Update URLs in a MySQL Database after Moving a Site with WGET

Sometimes you need to move your old website off of a CMS, or at least archive it, and the only way is to use WGET to mirror the website. Wget downloads entire websites, turning dynamic sites into static sites. The following command would download the site

wget -H, '--restrict-file-name=windows' -A gif,jpg,html,tcl -np --convert-links --html-extension -rx

That would download the gif, jpg, html, and tcl files, from both and, and make the URLs into Windows-compatible file names ending in ".html", and converting links into relative links, so the output folder could be moved.

That's all fine if you just want to link out to the site, but if you link to specific pages within the site, you now have to fix all the URLs. If you're using a database, and these URLs are in a table, it's not difficult to fix:

update stories set link=replace(link,'','/old-site/') WHERE tags like '%relevant%';

update stories set link=replace(link,'?','@') WHERE tags like '%relevant%';

update stories set link=concat(link,'.html') WHERE tags like '%relevant%';

Don't run those commands as-is. You have to alter them to work with your URLs. The basic idea is to replace the left part of the URL with your new site's URL, and then replace weird characters within the URL, and append '.html' to the URL.

Image with Transparent Caption

Here's some HTML and CSS to make an image with a transparent caption that displays over the image.

<style type="text/css">
  .caption-background {
      width: 500px;
      background-color: black;
      opacity: 0.7;
      margin-top: -80px;
      color: white;
   .caption {
      vertical-align: bottom;
      font-family: Helvetica,Arial;
      margin: 0px 10px 0px 10px;
   .caption H1 {
      margin: 0px;
      font-weight: normal;
   .caption P {
      margin: 0px 0px 15px 0px;
      padding: 0px 0px 5px 0px;
<img src="evergreen-soshiki.jpg" width="500" height="257" />
<div class="caption-background">
  <div class="caption">
    <h1>Caption caption caption caption
    <p>By Author Name | Date | N Comments

InDesign: Black Prints as Gray

InDesign Help had an article about this problem, where you think you're printing black, but it's coming out of the printer as gray.

Even worse, the blacks in the images come out black, making your gray look ugly!

Solution is to go to Edit->Preferences->Appearance of Black

Set On Screen to "Display all Blacks Accurately"

Set Printing to: "Output all Blacks as Rich Black".

Rich Black is a black produced by combining CMY and K into black. I think the normal thing to that happens is, black gets replaced with 100% K ink (black ink), which is actually a little bit gray when viewed next to Rich Black.

(Just to confirm, I looked at some offset-printed pages on glossy paper. Indeed, black ink looks lighter than black ink combined with another ink! You can see this by comparing a black graphic with a color graphic overlaid with some black text. The black text doesn't knock out the color ink -- the software is probably trying to avoid registration problems that would show up as white edges on the letters.)

Once you do that, InDesign seems to convert all blacks to have 100% K. But if you manipulate the color, you'll have to adjust the color. That's what worked for me.

Indenting Styles for C-Style Code

People get into all kinds of gripey little snits about how to indent code. Whenever you start a project, it's pretty important to nail down indentation, because it's one of those personal preferences that becomes "a big issue" when there's a conflict. Usually, the indentation is a non-issue, but it's something to fight about instead of discussing the real underlying issues, like interpersonal communication problesm.

So, let's catalog some styles, and discuss:

if (a==b) {

That's the standard Java style. It's pretty compact.

if (a==b)

That's the standard C style, and it puts a little extra whitespace in there. It's my favorite style, because it is the easiest to read.

if (a==b)

I think I saw that in Code Complete, a very good book about programming style, by Charles McConnell. I recommend the book, but not this indentation style. it's nice that the braces are aligned with the code... but it leaves the "if" way out there.

if (a==b) {

Hmmmm... That's a variation of the previous one. I don't like it. It's not irrational, but, still.

if (a==b)

This is my preferred style. I hate the way the else sits there, but only when I'm typing it. When I come back to read the code, it seems nice and airy. It also works well with "else if (...)"

switch ($a) {
case 'a':

This is a different statement. I didn't like putting the break to the left edge at first, but now like it, because it highlights that the program won't continue into the next block.

switch ($a) {
case 'a':

That was my old style. I like how the case statements stand out, but, now, it's hard to just allow the code from one block to continue into a different block. You could do it, but it'd be hard to notice, and could lead to some nasty bugs.

switch ($a)
case 'a':

This is like the spaced-out style I like, but I'm still not used to dropping the bracket onto the next line. Maybe it'll make sense, eventually.

a = 1;
cat = 2;
dog = 100;

I saw this in some Visual Basic code. It looked cool, but adding more text to the block looked tedious.

a = 1;
cat = 2;
dog = 100;

That's my style. Lazy.

a = 1;
cat = 2;
hotdog = 100;

A lot of people are into aligning the equals sign. It seems like a lot of work to me, especially if you have very_long_variable_names.

Every language has its own subtle rules, because every language has unusual features that may or may not translate well to the screen. For example, in Perl:

map {
} @array;

I like to use that, but if I were going to be more uptight:


That's not quite right, in my opinion. The code block isn't really just a code block -- it's passed as an argument to map. Perl lets you pass functions as arguments, and the map command will apply the function to the array. The function is defined in-line. Another form of map is written like this:

map(funcname, @array);

Another way to write it is to use the block again:

map { funcname($_) } @array;

Alternate uses influence the most wordy, indented style. You don't want the "big" style to be that different from the "small" style. You want them to look the same, if they are similar.

Installing R Packages Globally (for rApache)

For general instructions, see:

In Ubuntu Linux, the path to the global libraries is: /usr/local/lib/R/site-library/

To install there, you can do install.packages(c('foo'), '/usr/local/lib/R/site-library/')

or take advantage of the built in variable: install.packages(c('foo'),[1])

Check that has the values you need.

You can also use R CMD INSTALL -l /path/to/library foo

(It didn't work for me... :( )

Below is a story about installing globally from source:

I was trying to run some Rook code in rApache, and discovered (via RApacheInfo, r-info) that the package wasn't attached. Not being that familiar with either, I figured I needed to install a package globally.

The right way is described at stackoverflow by Dirk Eddelbuettel. littler is a scripting front end for R, so you can write R scripts as if they are regular scripts. (Normally, you need to go through the trouble of using here files.)

Install littler

apt-get install littler

I copied the example scripts into my local bin

cp /usr/share/doc/littler/examples/* ~/bin

Then installed Rook

sudo ~/bin/install.r Rook

Restarted Apache

sudo service apache2 restart

Then, went back to the RApacheInfo page to look at the libraries. Rook was there! Yay!

But going back to the URL with the Rook script failed.

Tailing the server logs says rCharts isn't installed.

So I then tried to install rCharts.

Didn't work!

Had to do this:

sudo -s
cd /usr/local/lib/R/site-library
R CMD INSTALL -l . master.tar.gz
service apache2 restart
# and then when it works
rm master.tar.gz

Turned out I needed more packages installed. Run these as root (or as recommended in the link, as a member of the staff group):

~/bin/install.r plyr
~/bin/install.r RJSONIO
~/bin/install.r whisker
~/bin/install.r yaml
~/bin/install.r zoo
~/bin/install.r DBI
# the following might require the libmysqlclient-dev package
~/bin/install.r RMySQL
# the next one doesn't work for R 3.0
~/bin/install.r devtools

Note: I haven't cleaned up my script and some of those libraries are extraneous... sorry.

All this stuff isn't automated, so you should paste it into a script. You'll need to run the update.r script later to update your packages.

Once that was done, the script could run a Hello, world program.

Getting the database going was a whole other task.

The rApache Config Lines
These follow the tutorial at the rApache site.

  <Location /RApacheInfo>
    SetHandler r-info
  <Location /RToeChart>
    SetHandler r-handler
    RFileEval /home/johnk/Dropbox/www/foobar/firstplotrapache.R:Rook::Server$call(app)

The MySQL cnf file
There are several ways to pass password info to the application, but the way I like is MySQL options files, aka the my.cnf file. In Debian systems, they are in the files /etc/mysql/conf.d/*.cnf. Become root. Create a file called foobar.cnf:

user = user
password = *****
host = localhost
port = 3306
protocol = TCP
database = foobar

database = foobar

Then you have to set the file owner and mode:

chown www-data /etc/mysql/conf.d/foobar.cnf
chmod go-rw /etc/mysql/conf.d/foobar.cnf

That's my setup. I don't think the rs-dbi section is required, but I have it there as a fallback.

Intel Motherboard Computer Crashes Without BSOD

We got these new computers at work, and for some reason, mine was crashing.

Being that I made the computer selection, I chose Intel mainboard systems. Sysadmins like Intel, but they are nearly invisible in the marketplace, and not favored by either gamer screwdriver shops or mass manufacturers. They sell some mobos to the mass makers, but you also see other brands like Asus, ECS, and MSI in a lot of boxes. So I went with Bytespeed, a small screwdriver shop servicing school systems, that only uses Intel motherboards.

The problem was, the computer crashed, and in an unusual way. The screen would get "noise" that looked like an old TV not latching onto a signal. It would crash, and there was never a BSOD or a crash log. (That's the price you pay for getting a "fast" computer rather than a midrange one - instability.)

Typically, I like to diagnose the issue rather than get immediate warranty service. For one, by waiting it out, you improve your odds of getting a more debugged product. Send it back immediately, and you're still pulling from a potentially faulty batch of parts. Aside from that, it's entirely possible that it's not the hardware. So I worked slowly to diagnose. (Also, the company's 5-year warranty gives you a lot of leeway.)

The first things tried were easy - replace the keyboard and mouse. Maybe they were flaky. That didn't fix it.

Next, I disconnected my cell phone from the USB. Again, not working.

Finally, the computer was moved, and another computer brought in, and I used Remote Desktop to use the computer. The system went super-stable. It never crashed, and never disconnected. The terminal computer also never failed (another new Bytespeed).

My theory shifted: it could be the monitor. The monitor was an old IBM CRT with very good color. Lots of range. It's also from the late 1990s. Being a decent monitor, it had Plug-and-Play. There's a signal that told the computer it was an E94, and the optimal resolutions.

So, after a couple weeks of Remote Desktop, the computers were re-arranged again, and an older Dell monitor attached to the computer. The system remained stable.

To confirm that it's the monitor that's causing trouble, I'll have to reconnect it at some point, and see if it causes crashes.

A side note - during the computer setup, I had a lot of problems with older USB devices plugged into the USB 3 ports. They caused crashes. So it's possible there was something in the USB 3 ports before causing the crash. However, given that these USB crashes generally resulted in the keyboard or mouse freezing up, I don't think it's the USB ports causing the specific crash I was having before. (Now I understand why computer vendors sell computer systems with peripherals - less trouble.)

Through all this, Bytespeed has been good. They're always contacting me about the status of this computer. They have competent tech support - around as good as the Dell business-class tech support (which is really good imho).

This computer problem, if you consider it, could not have been solved by regular tech support. The problem, I'm assuming, was this old monitor, which nobody is going to have. What it took was a technical person experimenting to discover the problems and quirks of the hardware.

Also, I don't consider switching to another mobo brand, except maybe Asus, to be an option. I've had too much trouble with the other brands. Support is generally nonexistent after the first couple years - the churn of boards and features is impressive, but scary too. They use less 'leet parts. With Intel, you pay more, but get what are considered better parts, and another year or so of driver updates.

Javascript Calculator: Split Up Your Reciepts

Here's a Javascript calculator that was put together to deal with situations where you have to split up a grocery receipt with a friend. You can type in the prices, one per line. Check the box if it's a taxable item. (Set the tax rate if it's not 8.25%.) Then, click the "+" button to add it up.

Tax rate: %

Josh's 3-Column Layout in CSS

Josh Haglund came up with an awesome way to do a 3-column layout in CSS.

Let's suppose you have three DIVs, arranged into three columns with the float:left and float:right styles. (Chances are, if you're reading this, you know what this is. If not, Google some other pages, and see what others do.) The common problem (aside from learning to use floats) is that the columns aren't all the same height.

The quick solution is to create a background image that looks like the 3-column layout. If it's a simple layout, then you should be able to use a 1-pixel tall, very wide line, repeated several hundred times, to create the columns. Put that skinny gif into a DIV via a background-image:url(skinny.gif).

Then, within this DIV, you have the layout. Whereever the column is short, the background image displays, making it look like the column extends to the bottom of the layout.

For best results, make the layout first, then create the background image.

Kindle Tricks (Linux)

An Amazon press release said that they sold more Kindle books than paper books. That might be true, but, they probably included the thousands of books being sold for free, or a few dollars. There are numerous public domain books "for sale" on Kindle. I downloaded several dozen.

Here are a few Amazon Kindle tricks.

If you take the clear plastic protector sheet that's stuck on the front, and stick it to the back, the cold metal back of the Kindle won't touch your fingers, and your hand will stay warmer.

The amazon tag "kindle freebies" will bring up all the free books you can get. Be careful, because a lot of free books are being sold for $1 to $3. The site will also send you free books, but Amazon will charge 15 cents per book for the download (unless you copy it via USB).

Mobigen.exe works in WINE. The linux_mobigen is no longer on the mobi site, it seems, and the random one floating around requires, which doesn't come with my distro of Ubuntu (I think the lib is an older version).

A couple aliases that could help:

alias kindle="sudo eject -t /dev/sdb"
alias mobigen="wine ~/bin/mobigen.exe -verbose"

The kindle command will cause the device to be mounted. That way, you don't have to keep unplugging the USB cable to get the Kindle to show up as a disk.

The mobigen command just runs mobigen from your bin directory.

mobigen will convert plain HTML files into .mobi files, which can be read on Kindle. That's the good news. The bad news is that most web pages have a lot of Javascript on them, so you need to view the printer-friendly version of the page, and convert that instead.

The verbose option seems to help it make the files.

If you keep getting the "can't make temporary file" error, try this:

First, run "wine cmd.exe". That gives you a DOS style shell.

Then type "bin/mobigen.exe File-to-convert.html", within the DOS shell.

That works for me. You just don't get to use all the file-name completion features.

An interactive ebook authoring tool is eCub by Julian Smart, who made wxWidgets. (Haven't tried it yet.) It uses the mobigen tool to generate the .mobi file.

This .mobi file kind-of sucks. It's a proprietary binary format without an open source implementation. It would be nice if Kindle had support for the .epub format. It would make it a little easier to do things like convert web pages into books, and copy them onto the reader.

I guess Amazon is using the iTunes model here. The simple-to-use pathways are all proprietary and have DRM, and making it easy to load other content onto the reader, while, possible, is not a priority. This may help authors and Amazon make some money now, but it could harm the utility of the Kindle in the future, because competing readers have .epub support.

People report that Gmail works well with Kindle. It's kind of clunky.

LPIC-1 Examp Self-Cram Notes

I was looking around and stumbled across an article about the LPI exams, which are generally considered the best of the many certs out there. That's to say, they are the toughest. It turns out there are a bunch of people selling old tests. LPI also has a cram course, and are going to be proctoring tests at SCALE, at a discount. I'm not sure I can handle LPIC, but this article is an attempt at self-assessment, and can be used as a study guide.

I'm going to copy the content from the following page, and use it as an outline to flesh out:

The layout on this drupal install is screwed up, so you can see the original text here at LPIC-1 Exam Cram.

System Architecture

Determine and configure hardware settings

Key Knowledge Areas

Terms and Utilities

Laptops are a Virus Risk: How to Email Safely

It's been seven years since the "I LOVE YOU" email virus of 2000, but these email viruses still manage to infect people. More importantly, email-based trojans are still being used to launch more complex, and subtle attacks. (See Timeline of notable computer virues and worms.)

A contemporary high-risk scenario involves laptops that leave the office, and become home computers in the evening.

Office networks generally have some form of malware detection and quarantine. More sophisticated sites run centralized file scanning and email scanning, combined with restricted user access, to reduce the impact of malware. So, within the office network, when a recognized virus appears, it's contained, and doesn't have the opportunity to destroy the network.

Outside of the office, though, tight security is a lot less common. Computers connected to the internet are attacked, relentlessly, by armies of "zombied" computers. Email malware floods into mailboxes.

Avoiding the Plague

One way to avoid the risk of the plague of malware is to modify your computer use so that it's a less inviting target. The following techniques will reduce your risk.


If you use Outlook for work, don't use it for your personal email. The Outlook and Outlook Express email clients are the most popular targets for virus-writers. They know that everyone gets a free copy of either one (or both) with their new computer. They also know it's hard to disable Outlook Express.

By using one of the less popular email applications, you deprive the viruses of the "environment" to spread. Some popular clients are Thunderbird, Sylpheed, Pegasus, and Eudora.

Turn on Anti-Virus at the ISP

If you're using your ISP-provided email address, you should find out if they offer anti-virus scanning. If so, you should turn that feature on. If they charge for it, you should consider paying, or switching to another email service.

Use Webmail

The big webmail sites do virus scanning. Hotmail, Yahoo, and Gmail can scan your messages for viruses. These aren't totally risk-free, but they are safer than nothing.

Loop Faster

nzakas has a great presentation about speeding up Javascript loops but it applies to any language that uses C-like loop structures.

The first principle is not to call a function in the comparison, if the compared value doesn't change. (This is pseudocode by the way.) Bad:

for( i=0; i < a.length(); i++)


len = a.length();
for( i=0; i < len; i++ )

The second principal is to count down rather than up. This is better:

l = a.length() - 1;
for( ; l >= 0; l-- )

The next optimization should be obvious:

l = a.length();
for( ; l-- ; )

And since we're not initializing or testing:

l = a.length();
while( l-- )

The speedup in interpreted languages is huge, but even in compiled languages, there are speedups because there's typically a "not equal to zero" instruction, or something that can leverage a comparison to zero.

Additionally, this code is easier to debug once you understand the idiom.

MS Access VBA: Error -2147217900 (80040e14)

Jawahar on Expertsforge says this is an SQL syntax error where a keyword is used as a field name.

In Access, the app finds these keywords and quotes them before running the query. It's all done behind the scenes, but you can expose this feature through the query design tool.

Create a new query in design view. Bring up the SQL view. Paste your SQL in there. (You are probably already be at this point, testing your SQL and knowing it works.)

Go to the Design view again. Then, go to the SQL view. Access should have added some parentheses and square brackets. The square brackets are used to quote keywords.

You can then fix your code by quoting your keywords. (Use the backtick (`) instead of square brackets to be more normal.)

MS Access, Outlook: recording bounced email addresses

This is a subroutine that will scan your Outlook inbox or a subfolder of inbox named "Bounces", and copy bounced email addresses to a MS Access database.

It will then join the table of bad addresses to another table (of people, presumably) and null out the bad addresses, so you won't send to them again.

This code is pretty jacked up, but, it works for my specific configuration, which is Outlook as the client, Exchange as the server. Many addresses won't be detected, because Exchange removes the internet email address, substituting the user's real-world name instead. For those, you'll have to manually remove the addresses.

(The problem here is "indirection". Outlook and Exchange try to hide the ugly internet email addresses, and use a more complex system that allows you to use the user's real name, and have it resolve to a record in a directory. That record contains the real address, whether it's an X.400, internet, or Exchange address. The problem with this is roughly the same problem people have with phones, when they use speed dial or memory dial all the time -- they forget the underlying phone number. In this situation, with the email address, it's the server deliberately losing the underlying email address.)

Public Sub CopyBouncedAddressesToDatabase()
    Dim conn As New ADODB.Connection
    Dim cmd As New ADODB.Command
    Dim rs As New ADODB.Recordset
    Dim AccessConnect As String
    AccessConnect = "Driver={Microsoft Access Driver (*.mdb)};" & _
                    "Dbq=DATABASE.mdb;" & _
                    "DefaultDir=C:\DATABASE;" & _
    conn.Open AccessConnect
    Dim inbox, bounces As Outlook.MAPIFolder
    Dim mail As Variant
    Dim body As String
    Dim lines As Variant
    Dim address As Variant
    Dim addressarray As Variant
    Set inbox = Outlook.Application.GetNamespace("MAPI").GetDefaultFolder(olFolderInbox)
    On Error GoTo NoBounces
    Set bounces = inbox.Folders.item("Bounces")
    On Error GoTo 0

    ct = bounces.Items.Count
    For i = ct To 1 Step -1
        Set mail = bounces.Items(i)
        lines = Split(mail.body, vbCrLf, 50)
        If UBound(lines) > 7 Then
            If lines(1) = "I'm afraid I wasn't able to deliver your message to the following addresses." _
                And InStr(lines(4), "@") Then
                    ' matches qmail bounces
                    address = Mid(lines(4), 2)
                    address = Left(address, Len(address) - 2)
                    conn.Execute "INSERT INTO tmpBouncingEmails (`email`) VALUES ('" & address & "')"
            ElseIf lines(0) = "Your message did not reach some or all of the intended recipients." _
                And InStr(lines(7), "@") Then
                    ' matches exchange bounces
                    address = LTrim(lines(7))
                    addressarray = Split(address)
                    address = addressarray(0)
                    address = Replace(address, "'", "")
                    conn.Execute "INSERT INTO tmpBouncingEmails (`email`) VALUES ('" & address & "')"
            ElseIf lines(0) = "Your message did not reach some or all of the intended recipients." _
                And (InStr(lines(9), "unknown user account>") _
                        Or InStr(lines(9), "User unknown>") _
                        Or InStr(lines(9), "No such user") _
                        Or InStr(lines(9), "Address rejected") _
                        Or InStr(lines(9), "Invalid recipient") _
                        Or InStr(lines(9), "User account is unavailable") _
                        Or InStr(lines(9), "Addressee unknown") _
                        Or InStr(lines(9), "Unable to deliver to") _
                        Or InStr(lines(9), "smtp;550") _
                    ) _
                    ' matches exchange bounces
                    address = LTrim(lines(9))
                    addressarray = Split(address)
                    offs = 1
                    For offs = 1 To UBound(addressarray)
                        If InStr(addressarray(offs), "@") Then Exit For
                    If offs <= UBound(addressarray) Then
                        address = addressarray(offs)
                        address = Replace(address, "...User", "")
                        address = Replace(address, "'", "")
                        address = Replace(address, "<", "")
                        address = Replace(address, ">:", "")
                        address = Replace(address, ">...", "")
                        address = Replace(address, ">", "")
                        address = Replace(address, "(", "")
                        address = Replace(address, ")", "")
                        conn.Execute "INSERT INTO tmpBouncingEmails (`email`) VALUES ('" & address & "')"
                    End If
            ElseIf lines(1) = "Unable to deliver message to the following address(es)." _
                And InStr(lines(4), "@") Then
                    ' matches first bounce in a bounce
                    address = LTrim(lines(4))
                    addressarray = Split(address)
                    address = addressarray(7)
                    address = Replace(address, "(", "")
                    address = Replace(address, ")", "")
                    conn.Execute "INSERT INTO tmpBouncingEmails (`email`) VALUES ('" & address & "')"
            ElseIf lines(0) = "Your message did not reach some or all of the intended recipients." _
                And (InStr(lines(9), "User account is overquota") Or _
                        InStr(lines(10), "User account is overquota")) Then
                    ' just ignore this message - account is good
            ElseIf lines(0) = "Your message did not reach some or all of the intended recipients." Then
                    ' at this point, we don't have an address for them
                    ' so we'll just log their outlook contact name or something
                    ' fixme
            End If
        End If ' lines.count > 7
    ' null out the bouncing email addresses
    conn.Execute "UPDATE tmpBouncingEmails INNER JOIN tblPeople ON = tblPeople.Email SET tblPeople.Email = Null"
    ' clear out the temporary table
    conn.Execute "DELETE * FROM tmpBouncingEmails"
    Exit Sub
' called if the bounces folder does not exist
    Set bounces = inbox
    Resume Next
End Sub

MS Access: Address Cleanup Macros

Here are some Excel macros that help to clean up data. Once cleaned, it's easier to remove duplicates. (I used these to de-dupe a list exported from Outlook.)

Included is a rough version of MS Access' Nz() function.

Public Sub SimplifyEmails()
    ' This subroutine scans a column, turning emails in this form:
    '   Joe Blow (
    ' Into this form:

    Dim Rng As Range
    Set Rng = Application.Intersect(ActiveSheet.UsedRange, _
    Col = Rng.Column
    N = 0
    For R = Rng.Rows.Count To 2 Step -1
        V = ActiveSheet.Cells(R, Col).Value
        ' Debug.Print V
        If V <> Empty Then
            If Nz(InStr(V, "(")) < Nz(InStr(V, ")")) _
              And Nz(InStr(V, "(")) > 0 Then
                Start = InStr(V, "(") + 1
                Length = InStr(V, ")") - Start
                newmail = "'" & Mid(V, Start, Length)
                Debug.Print newmail
                ActiveSheet.Cells(R, Col).Value = newmail
            End If
        End If
    Next R

End Sub

Function Nz(a As Variant) As Variant
    If IsNull(a) Then
       Select Case a.Type
          Case xlNumber
            Nz = 0
          Case Else
            Nz = ""
        End Select
       Nz = a
    End If
End Function

Public Sub NormalizePhones()

    Dim Rng As Range
    Set Rng = Application.Intersect(ActiveSheet.UsedRange, _
    Col = Rng.Column
    N = 0
    For R = Rng.Rows.Count To 2 Step -1
        V = ActiveSheet.Cells(R, Col).Value
        ' Debug.Print V
        If V <> Empty Then
            ' first replace . with -
            V = Replace(V, ".", "-")
            ' second if there's a dash in position 4, then turn it into parens
            If InStr(V, "-") = 4 Then
                V = "(" & Mid(V, 1, 3) & ") " & Mid(V, 5)
            End If
            ' third strip any double spaces (replace with single space)
            V = Replace(V, "  ", " ")

            ' fourth if there's a space in position 4, then turn it into parens
            If InStr(V, " ") = 4 Then
                V = "(" & Mid(V, 1, 3) & ") " & Mid(V, 5)
            End If
            ActiveSheet.Cells(R, Col).Value = V
        End If
    Next R

End Sub

Public Sub TrimAllCells()
    ' removes leading and trailing spaces, and replaces double-spaces with single spaces
    Dim Rng As Range
    Set Rng = Application.Intersect(ActiveSheet.UsedRange, _
    Col = Rng.Columns.Count
    N = 0
    For R = Rng.Rows.Count To 2 Step -1
        For C = Col To 2 Step -1
            V = ActiveSheet.Cells(R, C).Value
            If V <> Empty Then
                ' strip any double spaces (replace with single space)
                V = Replace(V, "  ", " ")
                ' ltrim and rtrim the data
                V = LTrim(V)
                V = RTrim(V)
                ActiveSheet.Cells(R, C).Value = V
            End If
        Next C
    Next R

End Sub

MS Access: Application Configuration Settings in Tables

This is a relational way to store application configuration in a table. It uses two tables, so you can store multiple configurations, so that you can use the tool over and over, and still retain the old settings. One table stores configurations, and one stores a since row with the current configuration in use.

Setting values are retrieved from the configuration tables with queries like this:
(SELECT PreRegActivityID FROM Congress7_Config WHERE ID=(SELECT CurrentConfigID FROM Congress7_CurrentConfig))

AccessAppConfiguration.jpg44.24 KB

MS Access: Automatically Jumping to the Only Record that Matches

Many years back, just before web pages got popular, I remember that some programs sent you as close as possible to your desired data whenever you searched. If you typed a search term, and only one record matched, you'd be taken to that record.

I have been using an Access db at work that doesn't have this feature. It's kind of a pain, because when you search, you sometimes get results that are one record, or no records at all. Below is code that will take you straight to the record if you type in a search term that's specific enough.

There's no magic shortcut here. You have to "peek" into the results to count the number of records your search will bring up, and behave accordingly.

There's also some logic to distinguish between searches for full names and last names. It's another way to refine the search quickly.

(BTW, you can't just drop this code into your project. You have to study it and replicate the logic for your own system. Sorry, lazy programmers.)

Here's some code to do that:

Private Sub ActFilter_AfterUpdate()
    On Error GoTo Err_ActFilter_Click

    Dim stDocName As String
    Dim stLinkCriteria As String
    Dim f As String
    Dim first, last As String
    Dim offset As Long
    Dim dbs As Database
    Dim rst As Recordset
    Dim fedid As Variant
    Set dbs = CurrentDb
    ' if they type both first and last name, try to match on both
    f = LTrim(RTrim([ActFilter]))
    offset = InStr(1, f, " ")
    If (offset > 0) Then
        first = Left(f, offset - 1)
        last = Mid(f, offset + 1)
        stLinkCriteria = "[FName] Like " & SQuote(first & "*") & _
           " AND [LName] Like " & SQuote(last & "*")
        stLinkCriteria = "[LName] Like " & SQuote(f & "*") & _
           " OR Email Like " & SQuote(f & "*")
    End If
    ' peek into db to see if records exist
    Set rst = dbs.OpenRecordset("SELECT FEDID FROM tblActivists WHERE " & stLinkCriteria)
    ' if no records exist, don't show results
    If rst.EOF Then
        MsgBox "Nobody matches."
        Exit Sub
    End If
    ' count how many results there are.  if only 1, then jump to the record
    If (rst.RecordCount = 1) Then
        fedid = rst.Fields("FEDID")
        ActFilter = ""
        DoCmd.OpenForm "frmActivists", , , "[FEDID] = " & fedid
        Exit Sub
    End If
    ' if we have more than one record, show a list of records
    stDocName = "frmActivList"
    ActFilter = ""
    DoCmd.OpenForm stDocName, , , stLinkCriteria
    Exit Sub

    MsgBox Err.Description
    Resume Exit_ActFilter_Click
End Sub

MS Access: Comparing Queries Between Two Databases (a query diff)

Often, when you have MS Access in a small office, and have done the right thing and split the database into a backend of tables and frontend of queries, reports, and forms, you end up with changes to the objects in multiple files. The trickiest is comparing queries, because the query object is modified if even a column width is changed. You need to dig deeper and compare queries.

This code below compares the local queries to queries in another database.

In order to use it, you need to link the remote MSysObjects table. Call it MSysObjects-REMOTE-mdb. That's because we get lists of queries by dumping them from the hidden MSysObjects table rather than via the APIs. This way, we get all the queries.

You also need to create a table tblMultiMDBQueryComparison with the following fields: DBName text, ObjName text, ModDate datetime. We dump the query object info into this table first, then generate a temporary report from it.

Normally, I wouldn't post code that, imnsho, is so crappy, but there were a number of people online requesting a tool that does this, or something similar, like comparing object modification dates.

Part of the reason it's so screwed up looking is that it uses both DAO and ADO. It's cut-and-pasted from the www and my past work.

What's interesting is that DAO will always return the SQL for a query, but ADO will not. ADO doesn't return queries (called commands) when the underlying SQL contains a bug. "This isn't a bug, it's a feature." You could hack this to point the "remote" db back to the local db, and find all the buggy queries.

Sub DiffQueries()
    Dim db As DAO.Database
    Dim rst As DAO.Recordset
    Dim qdf As DAO.QueryDef
    Dim q As DAO.QueryDef
    Dim cn As ADODB.Connection
    Dim rstNames As ADODB.Recordset
    Dim localdb As ADODB.Connection
    Dim remote As ADODB.Connection
    Dim cat As ADOX.Catalog
    Dim v As ADOX.View
    Dim cmd As ADODB.Command
    ' Use this as a model for dumping objects into the table.
    s = "INSERT INTO tblMultiMDBQueryComparison ( DBName, ObjName, ModDate ) " & _
     "SELECT 'LOCAL' AS DBName, MSysObjects.Name AS ObjName, MSysObjects.DateUpdate " & _
     "FROM MSysObjects WHERE ((MSysObjects.Type)=5) "
    Set db = CurrentDb
    ' Load the local objects
    db.Execute ("DELETE FROM tblMultiMDBQueryComparison")
    db.Execute s

    s = "INSERT INTO tblMultiMDBQueryComparison ( DBName, ObjName, ModDate ) " & _
     "SELECT 'mdb' AS DBName, MSysObjects.Name AS ObjName, MSysObjects.DateUpdate " & _
     "FROM `MSysObjects-REMOTE-mdb` as MSysObjects WHERE ((MSysObjects.Type)=5)"
    db.Execute s
    db.Execute "DELETE FROM tblMultiMDBQueryComparison WHERE ObjName LIKE '~*'"
    ' Create a table of object names.
    On Error Resume Next
    db.Execute "drop table tmpMultiMDBQueryComparison"
    db.Execute "create table tmpMultiMDBQueryComparison " & _
     "(ObjName text, LOCAL datetime, LOCALQuery memo, mdb datetime, mdbQuery memo, Newest text)"

    ' just in case the drop fails, and the table exists
    db.Execute "DELETE FROM tmpMultiMDBQueryComparison"

    s = "INSERT INTO tmpMultiMDBQueryComparison (ObjName) SELECT DISTINCT ObjName FROM tblMultiMDBQueryComparison"
    db.Execute s
    Set cat = New ADOX.Catalog

    Set localdb = CurrentProject.Connection ' Connect to current database.

    On Error GoTo AdoError
    Set remote = New ADODB.Connection
    remote.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
                     "Data Source=C:\PATH\DATA.mdb;"
    Set cat.ActiveConnection = remote
    Set rst = db.OpenRecordset("tmpMultiMDBQueryComparison", dbOpenTable)
    On Error GoTo 0

    While (Not rst.EOF)
        qName = rst.Fields("ObjName")
        For Each q In CurrentDb.QueryDefs
            If = qName Then
                rst.Fields("LOCALQuery").Value = q.sql
                rst.Fields("LOCAL").Value = q.LastUpdated
            End If

        For Each v In cat.Views
            If = qName Then
                Set cmd = v.Command
                rst.Fields("mdbQuery").Value = cmd.CommandText
                rst.Fields("mdb").Value = v.DateModified
            End If
    Exit Sub

       i = 1
       On Error Resume Next

       ' Enumerate Errors collection and display properties of
       ' each Error object (if Errors Collection is filled out)
       Set Errs1 = remote.Errors
       For Each errLoop In Errs1
        With errLoop
           strTmp = strTmp & vbCrLf & "ADO Error # " & i & ":"
           strTmp = strTmp & vbCrLf & "   ADO Error   # " & .Number
           strTmp = strTmp & vbCrLf & "   Description   " & .Description
           strTmp = strTmp & vbCrLf & "   Source        " & .Source
           i = i + 1
        End With

       ' Get VB Error Object's information
       strTmp = strTmp & vbCrLf & "VB Error # " & Str(Err.Number)
       strTmp = strTmp & vbCrLf & "   Generated by " & Err.Source
       strTmp = strTmp & vbCrLf & "   Description  " & Err.Description

       MsgBox strTmp

       ' Clean up gracefully without risking infinite loop in error handler
       On Error GoTo 0
End Sub

MS Access: Display A Subreport Even When There Are No Records

Seems like a lot of people are having a problem because Access automatically hides a subreport if it contains no records. Ref: PC Review, Experts Exchange, ASPFree. After digging through the various report and widget properties, there appears to be no property that will automatically display the subreport if there are no records. The way I finally got a subreport to display was to create a query which returns all the records, and also returns "blank" rows for nonexistent records. In this example, we have three tables: Orgs, People, and Positions. Positions is a table that has OrgID and PeopleID columns, and joins the other two tables. Positions has a column "HasRecord" which is a boolean that indicates that the person has a record. Positions contains not only the related records of interest, but other records as well; HasRecord indicates that this is a record we're looking for. This query will get you the list of all the orgs, with additional columns where there is a matching record in Positions where HasRecord is true.
( SELECT OrgID FROM Orgs ) a
( SELECT OrgID, PeopleID FROM Positions WHERE HasRecord=TRUE ) b
ON a.OrgID=b.OrgID
That gets you your result set, and you can then add some columns with a regular join:
SELECT Name, Address, Title FROM
   ( SELECT OrgID FROM Orgs ) a
   ( SELECT OrgID, PeopleID FROM Positions WHERE HasRecord=TRUE ) b
   ON a.OrgID=b.OrgID
) c
People p
ON c.PeopleID=p.PeopleID

What didn't work

One commenter suggested using a UNION query to add blank rows to the result. I tried doing this:
SELECT OrgID, PeopleID FROM Positions WHERE HasRecord=TRUE 
This didn't work because you'd end up with a blank line before records that exist Another possibility was to do a LEFT JOIN between Orgs and Positions, and match not only on HasRecord=TRUE but even when it's NULL.
SELECT * FROM Orgs LEFT JOIN Positions ON Orgs.OrgID=Positions.OrgID 
WHERE HasRecord=True OR HasRecord IS NULL
That doesn't work because HasRecord, if it exists, is either True or False. So False is not included. This means if a related record exists in Positions, but is not HasRecord=True, then, an OrgID for that organization won't show up in the results. If we include "OR HasRecord=False" to the statement, we end up selecting everything, including records we don't want. It just doesn't work.

MS Access: Geocoding and Distance Reporting

This is some code and controls that help you geocode addresses, and prepare a report of addresses sorted by distance from a point.

It's based on the Excel Geocoding Tool, but expands on it by adding a few features, including caching of calculated locations.

Addresses are stored in their own table, and are normalized a little bit, so that you don't end up geocoding the same address over. (For example, if you have 50 people at an office, that location should only be geocoded once.)

The code also shows how to change the sql datasource of a report in VBA code.

The code's incomplete, and you it's not a drop-in library. Integration will take some effort. There probably won't be any other "releases".

[I've been fixing up the code. This original code is a mess, and there are some weird things going on because I didn't understand VBA exception handling ( ).]

GeocodingDistanceKit.zip56.59 KB

MS Access: Inserting Blank Rows

This is a way to insert empty or empty-like rows into a list of "seats" that contains not only reservations, but a number saying how many seats a group of people have. If the number is greater than the number of seats, this adds new blank rows for empty seats.

Sub insertBlankRows()
    Dim dbs As Database, qdf As QueryDef, strSQL As String
    Dim rst As Recordset
    Set dbs = CurrentDb
    strSQL = "SELECT tblSeats.OrganizationId, [MaxOfSeats]-Count([OrganizationId]) AS Difference, " & _
            " Count(tblSeats.OrganizationId) AS CountOfOrganizationId, Max(tblSeats.Seats) AS MaxOfSeats " & _
            " FROM tblSeats GROUP BY tblSeats.OrganizationId;"
    Set rst = dbs.OpenRecordset("qryDifferences", dbOpenForwardOnly)
    While (Not rst.EOF)
        For i = 1 To rst!Difference
            insSQL = "INSERT into tblSeats (OrganizationID, LastName, FirstName) VALUES (" _
                & rst!OrganizationId & ", '', '')"
            ' MsgBox (insSQL)
            dbs.Execute (insSQL)

End Sub

emptyrows.jpg90.86 KB

MS Access: Inserting Records with Visual Basic and DAO

This example shows you how to add records with VBA and DAO instead of with SQL queries. Sometimes, it's easier to do it this way.b (The original intent was to simultaneously create a relation between the new record and another table, but this didn't happen.)

Public Sub importFolks()
    Dim dbs As Database
    Dim rstFrom As Recordset
    Dim rstTo As DAO.Recordset
    Set dbs = CurrentDb()
    Set rstFrom = dbs.OpenRecordset("tmp match up list to db")
    Set rstTo = dbs.OpenRecordset("tblActivists", dbOpenDynaset, dbAppendOnly)
    a = 0
    Do Until (rstFrom.EOF)
        rstTo.Fields("Fname") = rstFrom.Fields("Field11")
        rstTo.Fields("Lname") = rstFrom.Fields("Field12")
        rstTo.Fields("Email") = Nz(rstFrom.Fields("email"))
        parts = ParsePhoneNumbers(Nz(rstFrom.Fields("Phone")), 1)
        rstTo.Fields("WCode") = parts(1)
        rstTo.Fields("WPhone") = parts(2)
        parts = ParsePhoneNumbers(Nz(rstFrom.Fields("FAX")), 1)
        rstTo.Fields("FCode") = parts(1)
        rstTo.Fields("Fax") = parts(2)
        rstTo.Fields("Cell") = rstFrom.Fields("cellNumber")
        a = a + 1
End Sub

MS Access: Inserting and Deleting Contact Items With VBA

Gripe: VBA syntax is difficult. The object system is a little confusing too. It's just very hard to use. To make things even more difficult, the sample code out there is kind of *weird*. Maybe there's some good reasons for doing things their way, but, it just seems verbose, error prone, and hard to write, to me.

Here's some code that is the start of a library to work with Outlook's folders. It's based on some code samples from the web, refactored into something resembling a library.

The best feature is the function OLGetSubFolder, which returns a MAPI folder object for a given path. Totally useful.

I don't really understand why the first folder is under folders.Item(1), but the sample code used that, so I'm calling that the root folder. Maybe there are folders above that, and this is wrong.

Also featured in this code are a function to test for the existence of an object, and create folders.

Option Compare Database

Public Sub test()
    Dim foldroot As Outlook.MAPIFolder
    Dim foldr As Outlook.MAPIFolder
    Dim newfolder As Outlook.MAPIFolder

    Set foldroot = OLGetRootUserFolder()
    Set foldr = OLGetSubFolder(foldroot, "\\Contacts")
    Set foldr = OLMakeFolder(foldr, "Lists")
    Set newfolder = OLMakeFolder(foldr, "Executive Board")
    Set newfolder = OLMakeFolder(foldr, "Delegates")
    Set newfolder = OLMakeFolder(foldr, "COPE Board")
    OLExportQueryToFolder newfolder, "prmCOPEBOARD"
    Set newfolder = OLMakeFolder(foldr, "Affiliates Offices")
End Sub

Public Sub OLExportQueryToFolder(folder As Outlook.MAPIFolder, query As String)
    Dim sFname, sLname, sEmail As String
    Dim dbs As Database
    Dim rst As Recordset
    Set dbs = CurrentDb
    Set rst = dbs.OpenRecordset(query, dbOpenForwardOnly)
    While Not rst.EOF
        If IsNull(rst!Fname) Then sFname = "" Else sFname = rst!Fname
        If IsNull(rst!Lname) Then sLname = "" Else sLname = rst!Lname
        If IsNull(rst!email) Then sEmail = "" Else sEmail = rst!email
        OLInsertContactItem folder, sFname, sLname, sEmail
End Sub

Public Function OLMakeFolder(foldr As Outlook.MAPIFolder, newfolder As String) As Outlook.MAPIFolder
    Dim f As Outlook.MAPIFolder
On Error GoTo FolderDoesNotExist
    Set f = foldr.folders(newfolder)
    Set OLMakeFolder = f
    Exit Function
    Set f = foldr.folders.Add(newfolder)
    Set OLMakeFolder = f
End Function

' based on http://www.programmingmsaccess.c...
Public Sub OLInsertContactItem(foldr As Outlook.MAPIFolder, ByVal first As String, ByVal last As String, ByVal email As String)
    Dim cit1 As Outlook.ContactItem
    Dim citc1 As Outlook.Items
    Set cit1 = foldr.Items.Add(olContactItem)
    With cit1
        .FirstName = first
        .LastName = last
        .Email1Address = email
    End With
End Sub

Private Sub OLDeleteAllInFolder(MAPIFolder As Outlook.MAPIFolder)
    Dim c As Object
    Dim i As Outlook.Items
    Set i = MAPIFolder.Items
    For Each c In i
End Sub

' based on
Private Function OLGetSubFolder(MAPIFolderRoot As Outlook.MAPIFolder, folderPath As String) As Outlook.MAPIFolder
    Dim returnFolder As Object
    Dim parts() As String
    Dim part
    Set returnFolder = MAPIFolderRoot
    parts = Split(folderPath, "\")
    For Each part In parts
        ' Debug.Print "-" & part & "-"
        If part <> "" Then
            Set returnFolder = returnFolder.folders.Item(part)
        End If

    Set OLGetSubFolder = returnFolder
End Function

Private Function OLGetRootUserFolder() As Outlook.MAPIFolder
    Dim ola1 As Outlook.Application
    Dim foldr As Outlook.MAPIFolder
    Set ola1 = CreateObject("Outlook.Application")
    Set OLGetRootUserFolder = ola1.GetNamespace("MAPI").folders.Item(1)
End Function

MS Access: Logging Messages

Here's some code to help you log messages to a table. First, make a table called tblLog, with at least these columns: Timestamp, User, Computer, Message. (You don't need a primary key.)

Set the default value of Timestamp to NOW().

Copy the following code into a code module.

Also, add a reference to "Active DS Type something or other". It has the active directory functions you need to discover the username.

Function StartUp()
    Dim dummy
    dummy = LogOpen()
    DoCmd.OpenForm "frmHidden", acNormal, , , , acHidden
    StartUp = Null
End Function

Function LogOpen()
    LogMessage ("User opened database.")
End Function

Function LogClose()
    LogMessage ("User closed database.")
End Function

Function LogMessage(Mess As String)
    Dim sysInfo As New ActiveDs.WinNTSystemInfo
    Dim UserName As String
    UserName = sysInfo.UserName
    If UserName <> "" Then
        Dim dbs As Database
        Dim rst As Recordset
        Set dbs = CurrentDb
        dbs.Execute ("INSERT INTO tblLog (User, Computer, Message) VALUES ('" & sysInfo.UserName & _
            "','" & sysInfo.ComputerName & _
            "','" & Mess & "')")
    End If
    LogMessage = True
End Function

(StartUp looks messed up. I don't know what I'm doing. There's also a pointless temporary variable in LogMessage.)

To enable startup and shutdown logging, create a macro called AutoExec, and in the macro, call the StartUp function.

Then create a new form called "frmHidden", and add a hander for the Close event. In that event, call the LogClose function. Save all that.

What's happening is that the frmHidden form is opened up during startup, but is hidden. Then, during shutdown, it's Close event handler is called. This is a crappy hack. Improvements are appreciated.

MS Access: Printing the Range of Data on the Page on a Report

I wanted to print a report that indicated the first and last item on each page, just like a dictionary has. You know: "Azeri - Babcock", "Milk - Minder". It makes it easier to flip through printouts.

This is how to do it. It will put the range in the footer. I haven't figured out how to do one in the header, which is what I originally wanted, but found too difficult to do. (There is probably a way.)

First, take your report, and add an unbound field to your report. Rename it to "Range". See the picture below.

Then, set up event handlers for the On Print event of each section. An explanation follows the picture. Here's my code:

Option Compare Database
Option Explicit
Public FirstRow As String
Public CurrentRow As String

' All this code fails.  I may need to work out a way to put ranges on the
' pages by running this report once to fill values, and again to
' re-populate the report with ranges.

Private Sub Detail_Print(Cancel As Integer, FormatCount As Integer)
    CurrentRow = [OrgName]
    If FirstRow = "" Then
        FirstRow = CurrentRow
    End If
End Sub

Private Sub PageFooterSection_Print(Cancel As Integer, FormatCount As Integer)
    [Range] = FirstRow & " to " & CurrentRow
End Sub

Private Sub PageHeaderSection_Print(Cancel As Integer, FormatCount As Integer)
    ' clear out the tracking variable
    FirstRow = ""
End Sub

Okay, it's pretty simple. Every report is made up of parts, and Access has added a couple events to the different parts, so you can execute code while the report renders.

This code keeps track of the first and current values of OrgName (the field we sort and group on). When we get to the footer, the current value now holds the last value. These two values are concatenated, and then written to the [Range] field.

Putting this value at the top of the page is hard, because the top is lain out before the bottom, and I can't figure out a way to cause the top to be reformatted before the final rendering.

Range.jpg114.05 KB

MS Access: Quoting Strings in SQL

I was having a real WTF moment with Access. I'd coded up an SQL query in access, and a string had a single quote in it, fouling up the query.

The SQL was something like this:

SELECT * FROM Places WHERE Name='Joe's Bar'

Obviously, I forgot to quote the string correctly. For some reason, web searches didn't really turn up much about quoting text strings in SQL statements in Access. There was a lot of code that looked like this:

sql = "SELECT * FROM Places WHERE NAME='" & name & "'"

My code was like that too, because that's what everyone was doing. What's funny is that I've used paramterized queries in Java, and written some similar tools for PHP, but back in VBA, I use that broken style.

Knowing the right way to do it, I googled for notes about using parameterized queries in MS Access and Jet. It looked hard. It also looked verbose, and it was a little confusing.

Further searches turned up results about quoting strings, but they were kind of "not pretty":

sql = "SELECT * FROM Places WHERE NAME='" & Replace(name,"'","''") & "'"

Well, at least it's explicit.

Instead, here's a half-way solution that cleans up the code a bit. It's inspired by Perl::DBI's quote function, which will escape quotes and also add quotes around the string:

' Single quote a string (and escape contents)
Public Function SQuote(s As String) As String
    SQuote = "'" & Replace(s, "'", "''") & "'"
End Function

' Adds a comma, so you can create constructions like:
' SQuoteComma(foo) & SQuoteComma(bar)
' Result: 'foo''svalue','bar''svalue'
Public Function SQuoteComma(s As String) As String
    SQuoteComma = SQuote(s) & ","
End Function

Public Function DQuote(s As String) As String
    DQuote = """" & Replace(s, """", """""") & """"
End Function

Public Function DQuoteComma(s As String) As String
    DQuoteComma = DQuote(s) & ","
End Function

Now the statement looks like this:

sql = "SELECT * FROM Places WHERE NAME=" & SQuote(name)

Also, if you have an INSERT statement, you can construct a comma-separated list of strings like this:

sql = "INSERT INTO Places (Name,Street,City) VALUES (" & _
	SQuoteComma(name) & SQuoteComma(street) & SQuote(city) & _

Even with the long function name, it's fewer characters than "'" & "'".

MS Access:Can't Add New Record to Subform

A subform we were entering data into stopped working. One day it was working, the next, it was not. The problem turned out to be the datasource; the underlying query started with "select distinct". For some reason, probably because there were duplicate records in the underlying table, the query caused the form to stop accepting edits -- it became a read-only query. The solution was to set the uniqueness to "no", which removed the "distinct" from the query.

Some posts on the web say as much: the record source has to be writeable, meaning it can't be a UNION, most JOINs, and DISTINCTs.

MS Excel: Cleverer Table Importer

These are some functions that help you write a script to import Excel data into a SQL database. What makes this different from the Access import feature is that the data can be poorly formatted. This specific code is for the Crystal Reports export feature. Crystal exports data by converted the final output to an Excel sheet, but the sheet includes the headers and titles, as well as blank columns. In short, it's not ready to import.

Additionally, the CSV export feature of Crystal spits out incomplete data, so the Excel export is the best export.

So, what we need is an importer that can read data with empty columns, with a header line way down the page a few lines.

This partially completed importer works by finding, then analyzing the header line for column names, and noting which column name goes with which column number. With the offsets of each column, then, loop over the table, mapping each column back to column names, and using that to create an SQL string to insert the data. We also pass in some hints about which fields to quote, and which to convert from dateserials to textual dates.

This code doesn't yet have the necessary code to import the data into the table. The final version of the code will run within Access, and control an instance of Excel.

Public Sub test()
    Dim offsets As Dictionary
    Dim quotes As New Dictionary
    Dim row As Dictionary
    Dim dest As New Dictionary
    quotes.Add "code", "quote"
    quotes.Add "PaidThrough", "date"
    quotes.Add "Mems", "number"
    quotes.Add "UpdateTime", "quote"
    n = Format(Now(), "yyyy/mm/dd")
    import_goto_start ("Customer #")
    Set offsets = import_get_heading_offsets
    ' move cursor down one cell
    While (Application.Selection <> "")
        Application.ActiveCell.Offset(1, 0).Select
        Set row = import_get_row(offsets)
        dest.Add "code", row("Customer #")
        dest.Add "PaidThrough", row("through")
        dest.Add "Mems", row("Members")
        dest.Add "UpdateTime", n
        Sql = import_build_sql("foo", dest, quotes)
        Debug.Print Sql
End Sub

Public Sub import_goto_start(search As String)
    ' moves cursor to the first likely line of data, which is the first
    ' cell of the header row.  Call this before anything else.
    r = 1
    While (r < 20)
        c = 1
        While (c < 5)
            With Workbooks(1).Worksheets(1)
                If (.Cells(r, c) = search) Then
                    .Cells(r, c).Select
                    Exit Sub
                End If
            End With
            c = c + 1
        r = r + 1
End Sub

Function import_get_heading_offsets() As Dictionary
    ' returns a dictionary mapping field names to column numbers
    Dim res As New Dictionary
    Dim r As Integer
    Dim c As Integer
    With Workbooks(1).Worksheets(1)
        c = Application.ActiveCell.Column
        r = Application.ActiveCell.row
        For col = c To 100
            Heading = .Cells(r, col).Value2
            If Heading <> "" Then
                res.Add col, Heading
            End If
    End With
    ' return that dictionary
    Set import_get_heading_offsets = res
End Function

Function import_get_row(offsets As Dictionary) As Dictionary
    ' returns a row of data as an associative array
    Dim res As New Dictionary
    With Workbooks(1).Worksheets(1)
        r = Application.ActiveCell.row
        ' what is the way to scan the row based on the collection's contents???
        For col = 1 To 10
            If offsets.Exists(col) Then
                res.Add offsets.Item(col), .Cells(r, col).Value2
                'Debug.Print "Adding " & .Cells(r, col).Value2 & " : " & offsets.Item(col)
                'Debug.Print "Column " & col & " ignored. " & offsets.Item(col) & " : " & .Cells(r, col).Value2
            End If
    End With
    Set import_get_row = res
End Function

Function import_build_sql(table As String, data As Dictionary, quotes As Dictionary) As String
    ' takes an associative array as input and generates an "insert"
    ' for the table.  the field names must match.
      s = ""
    For Each d In data
        If s <> "" Then s = s & ", "
        If (quotes(d) = "quote") Then
            s = s & " " & d & "='" & data(d) & "'"
        ElseIf (quotes(d) = "date") Then
            s = s & " " & d & "='" & Format(data(d), "yyyy/mm/dd") & "'"
            s = s & " " & d & "=" & data(d)
        End If
    s = "INSERT INTO " & table & s
    import_build_sql = s
End Function

' PHP pseudocode
' offsets = import_get_heading_offsets()
' while( row = import_get_row(offsets) ) :
'    new['field1'] = row['fieldx']
'    ...
'    sql = import_build_sql('table', new)
'    cn.execute sql
' endwhile

The code's a little bit dirty. VBA Dictionaries were hard to learn, because MS docs tend to have simple example code. There are a few places I wished to make more efficient.

MS Excel: Moving Cursor to the First Occurence of a String

This works like "find", but you can restrict the range it searches (by editing the code). You can also switch the loops around so it scans columns before rows.

This code is part of an Excel importer project for Access. The data is kinda weird, and can't be imported via the normal importer. I'm using FunctionX's VBA for Excel tutorial as a reference.

Public Sub test()
    import_goto_start ("Customer #")
End Sub

Public Sub import_goto_start(search As String)
    ' moves cursor to the first likely line of data, which is the first
    ' cell of the header row.  Call this before anything else.
    r = 1
    While (r < 20)
        c = "A"
        While (c <> "E")
            With Workbooks(1).Worksheets(1)
                If (.Range(c & r) = search) Then
                    .Range(c & r).Select
                    Exit Sub
                End If
            End With
            c = Chr(Asc(c) + 1)
        r = r + 1
End Sub

MS Outlook and Access: Recording Bounced Email Addresses

This is the start of a macro that will scan your Outlook Inbox or a subfolder named "Bounces" for bounce messages, and record such messages to an Access database.

The BouncingEmails.mdb files contains a single table, named "bounces", that has a single column named "email".

This code will only match qmail and the Exchange server's bounce messages. Each server has its own message format, so needs a little code for each bounce.

' This scans the current folder and copies the bouncing email address to
' C:\DB\BouncingEmails.mdb

Public Sub CopyBouncedAddressesToDatabase()
    Dim conn As New ADODB.Connection
    Dim cmd As New ADODB.Command
    Dim rs As New ADODB.Recordset
    Dim AccessConnect As String
    AccessConnect = "Driver={Microsoft Access Driver (*.mdb)};" & _
                    "Dbq=BouncingEmails.mdb;" & _
                    "DefaultDir=C:\DB;" & _
    conn.Open AccessConnect
    Dim inbox, bounces As Outlook.MAPIFolder
    Dim mail As Variant
    Dim body As String
    Dim lines As Variant
    Dim address As Variant
    Dim addressarray As Variant
    Set inbox = Outlook.Application.GetNamespace("MAPI").GetDefaultFolder(olFolderInbox)
    On Error GoTo NoBounces
    Set bounces = inbox.Folders.item("Bounces")
    On Error GoTo 0

    ct = bounces.Items.Count
    For i = ct To 1 Step -1
        Set mail = bounces.Items(i)
        lines = Split(mail.body, vbCrLf, 50)
        If UBound(lines) > 7 Then
            If lines(1) = "I'm afraid I wasn't able to deliver your message to the following addresses." _
                And InStr(lines(4), "@") Then
                    ' matches qmail bounces
                    address = Mid(lines(4), 2)
                    address = Left(address, Len(address) - 2)
                    conn.Execute "INSERT INTO bouncing (`email`) VALUES ('" & address & "')"
            ElseIf lines(0) = "Your message did not reach some or all of the intended recipients." _
                And InStr(lines(7), "@") Then
                    ' matches exchange bounces
                    address = LTrim(lines(7))
                    addressarray = Split(address)
                    address = addressarray(0)
                    conn.Execute "INSERT INTO bouncing (`email`) VALUES ('" & address & "')"
            End If
        End If ' lines.count > 7
    Exit Sub
' called if the bounces folder does not exist
    Set bounces = inbox
    Resume Next
End Sub

MS Outlook: Remove Duplicate Contacts

This is a pretty good de-duper based on the one posted to a forum. This one normalizes some data so it'll match, even if it looks different.
' by pbj75

Public Sub deleteDuplicateContacts()
    Dim oldcontact As ContactItem, newcontact As ContactItem, j As Integer
    Set myNameSpace = GetNamespace("MAPI")
    Set myfolder = myNameSpace.GetDefaultFolder(olFolderContacts)
    Set myitems = myfolder.Items
    myitems.Sort "[File As]", olDescending
    totalcount = myitems.Count
    j = 1
    While ((j < totalcount) And (myitems(j).Class <> olContact))
        j = j + 1
    Set oldcontact = myitems(j)
    For i = j + 1 To totalcount
        If (myitems(i).Class = olContact) Then
            Set newcontact = myitems(i)
            If ((newcontact.LastNameAndFirstName = oldcontact.LastNameAndFirstName) And _
                (NormPhone(newcontact.PagerNumber) = NormPhone(oldcontact.PagerNumber)) And _
                (NormPhone(newcontact.MobileTelephoneNumber) = NormPhone(oldcontact.MobileTelephoneNumber)) And _
                (NormPhone(newcontact.HomeTelephoneNumber) = NormPhone(oldcontact.HomeTelephoneNumber)) And _
                (NormPhone(newcontact.BusinessTelephoneNumber) = NormPhone(oldcontact.BusinessTelephoneNumber)) And _
                (NormAddress(newcontact.BusinessAddress) = NormAddress(oldcontact.BusinessAddress)) And _
                (newcontact.Email1Address = oldcontact.Email1Address) And _
                (newcontact.HomeAddress = oldcontact.HomeAddress) And _
                (newcontact.CompanyName = oldcontact.CompanyName)) Then
                'use FTPSite as a flag to mark duplicates
                newcontact.FTPSite = "DELETEME"
                newcontact.FTPSite = ""
            End If
        Set oldcontact = newcontact
        End If
    Next i
End Sub

Public Function NormPhone(ByVal p As String) As String
    ' first, replace . with -
    p = Replace(p, ".", "-")
    ' second if the 4th character is "-" then change the format to (nnn) nnn-nnnn
    If (Mid(p, 4, 1) = "-") Then
        p = "(" & Mid(p, 1, 3) & ") " & Mid(p, 5)
    End If
    If (Mid(p, 5, 1) = ")" And Mid(p, 6, 1) <> " ") Then
        p = Mid(p, 1, 5) & " " & Mid(p, 6)
    End If
    NormPhone = p
End Function

Public Function NormAddress(ByVal a As String) As String
    a = Replace(a, "USA", "")
    a = Replace(a, "United States of America", "")
    a = RTrim(a)
    a = Replace(a, vbCrLf, " ")
    a = Replace(a, vbCr, " ")
    a = Replace(a, vbLf, " ")
    a = Replace(a, "  ", " ")
    a = Replace(a, "  ", " ")
    a = Replace(a, "  ", " ")
    NormAddress = a
End Function

MS Outlook: Spamassassin Training with MIME Email (.EML) Files

Here's a VBA script that I'm using to train Spamassassin from Outlook. It saves out email messages to a file server where messages are used to train the filter. The problem here is that Outlook doesn't save EML (MIME format) files. You can save messages as text, but lately, spammers have been loading messages with a lot of chaff text that looks like regular email. You can't train with that, because it might cause the filter to start mis-identifying legit email as spam.

The chaff is usually in the HTML as white text, at a small font size. So the user never sees it, but the filter's supposed to see it.

The partial solution is to save the messages as regular email, and .EML file, with the HTML parts intact. Spamassassin seems to have code that will treat obfuscated HTML correctly. That way, the white text is removed from the training.

This code is very raw. Plenty of things to fix, like error handling, but it is working right now. The code is set up not to save out text versions of the email.

To use it, go to a folder, select the spam, and run the MarkAsSpam macro.

This is intended to be used by the sysadmin. I have learned that end-user spam filtering is hit and miss. Some people use spam filters to block legit email rather than unsubscribe from the messages.

Sub MarkAsHam()
    CopyMessagesToFile ("\\mailfilter\spamassassin-ham\")
End Sub

Sub MarkAsSpam()
    CopyMessagesToFile ("\\mailfilter\spamassassin-spam\")
End Sub

' Move the selected message(s) to the given folder **************************
Function CopyMessagesToFile(folderName As String)

    Dim myOLApp As Application
    Dim myNameSpace As NameSpace
    Dim myInbox As MAPIFolder
    Dim currentMessage As MailItem
    Dim errorReport As String
    Set myOLApp = CreateObject("Outlook.Application")
    Set myNameSpace = myOLApp.GetNamespace("MAPI")
    Set myInbox = myNameSpace.GetDefaultFolder(olFolderInbox)

    ' Figure out if the active window is a list of messages or one message
    ' in its own window
    On Error GoTo QuitIfError    ' But if there's a problem, skip it
    Select Case myOLApp.ActiveWindow.Class
        ' The active window is a list of messages (folder); this means there
        ' might be several selected messages
        Case olExplorer
            Debug.Print "list of messages"
            For Each currentMessage In myOLApp.ActiveExplorer.Selection
                Call writeAsFile(folderName, currentMessage)
        ' The active window is a message window, meaning there will only
        ' be one selected message (the one in this window)
        Case olInspector
            Call writeAsFile(folderName, myOLApp.ActiveInspector.CurrentItem)
        ' can't handle any other kind of window; anything else will be ignored
    End Select
QuitIfError:       ' Come here if there was some kind of problem
    Set myOLApp = Nothing
    Set myNameSpace = Nothing
    Set myInbox = Nothing
    Set currentMessage = Nothing
End Function

Sub writeAsFile(folderName As String, item As MailItem)
    On Error GoTo Bail
    Dim x As MailItem
    Dim fn As String
    Set x = item
    'Let fn = folderName & Right(x.EntryID, 64) & ".txt"
    'Debug.Print "file will be " & fn
    'Open fn For Output As #1
    '    Print #1, "From : " & x.SenderEmailAddress
    '    Print #1, "To: " & x.To
    '    Print #1, "Subject: " & x.Subject
    '    Print #1, vbCrLf & vbCrLf
    '    Print #1, x.body
    Let fn = folderName & Right(x.EntryID, 64) & ".eml"
    Debug.Print "file will be " & fn
    Open fn For Output As #2
        Print #2, "From : " & x.SenderEmailAddress
        Print #2, "To: " & x.To
        Print #2, "Subject: " & x.Subject
        Print #2, "MIME-Version: 1.0"
        Print #2, "Content-Type: multipart/alternative;"
        Print #2, "        boundary = ""----=_NextPart_000_000D_01CCF6AD.D1159750"""
        Print #2, "Content-Language: en-us"
        Print #2, ""
        Print #2, "This is a multipart message in MIME format."
        Print #2, ""
        Print #2, "------=_NextPart_000_000D_01CCF6AD.D1159750"
        Print #2, "Content-Type: text/plain;"
        Print #2, "        Charset = ""us-ascii"""
        Print #2, "Content-Transfer-Encoding: 7bit"
        Print #2, ""
        Print #2, item.body
        Print #2, "------=_NextPart_000_000D_01CCF6AD.D1159750"
        Print #2, "Content-Type: text/html;"
        Print #2, "        Charset = ""UTF-8"""
        Print #2, "Content-Transfer-Encoding: 7-bit"
        Print #2, "Content-Disposition: inline"
        Print #2, ""
        Print #2, item.HTMLBody
        Print #2, "------=_NextPart_000_000D_01CCF6AD.D1159750--"
    On Error GoTo 0

    Close #1
    Close #2
    Set item = Nothing
End Sub

MSAccess: Showing "Continue..." Conditionally at the bottom of a Section in a Report

Maybe I'm missing something - but it looks like Access doesn't have this feature - to put "Continued..." or "More..." at the bottom of a section if the next section is on the next page. If it exists, please comment or email me at johnk@ the domain name of this site. I seriously hope it exists.

I have this complex report that is a little non-standard - and here's how I did it. The general technique is at this other post:
Printing a Repeated Section Message like "Continued"

My function is this:

([txtDetailNum]>11 and [txtDetailNum]=MaxValue([NamedDelegates],[EligibleDelegates]) and [EligibleDelegates]<21) 
or ([txtDetailNum]=20) 
or ([txtDetailNum]=40) 
or ([txtDetailNum]=60),

The MaxValue function is defined like this:

Public Function MaxValue(a, b)
    a = Val(a)
    b = Val(b)
    If (a > b) Then
        MaxValue = a
        MaxValue = b
    End If
End Function

MaxValue is a lot like the traditional max() but it converts strings to numbers first, because it looks like values in Access reports might become strings.

The logic to show the message works for me, but there's a bug in there. When txtDetailNum is within a range where the list ends near the bottom of the page, it should show the message, because the footer gets bumped to the next page. That logic is expressed in the first part of the expression:
([txtDetailNum]>11 and [txtDetailNum]=MaxValue([NamedDelegates],[EligibleDelegates]) and [EligibleDelegates]<21)

(The MaxValue part deals with a data glitch when the number of named delegates > eligible delegates.)

So the entire expression should have lines like that throughout in addition to txtDetailNum=20. It just turned out that my data didn't end in the high 30s or high 50s.

A correct expression would be a bit more complex, and should use VBA. You'd need to define a function that returns true if "Continued..." should be printed. The logic would be something like this:

function printContinue(txtDetailNum) {
  pagePosition = txtDetailNum % recordsPerPage
  if (pagePosition >= recordThatWouldTriggerBreak) and (txtDetailNum == lastRow) then
     return true
    if (pagePosition == recordsPerPage-1) then
       return true
    end if
    return false
  end if

MSAccess: VBA CRC32 Function

Here's a CRC32 function based on the work at: cCRC32.

The main difference is that this is a function, and the crc32 table is not recalculated each time. If there's a way to do constant arrays, I'd like to know. I haven't found anything online.

Function CRC32(str As String)
    Dim crc32Table(256) As Long
    crc32Table(0) = 0
    crc32Table(1) = 1996959894
    crc32Table(2) = -301047508
    crc32Table(3) = -1727442502
    crc32Table(4) = 124634137
    crc32Table(5) = 1886057615
    crc32Table(6) = -379345611
    crc32Table(7) = -1637575261
    crc32Table(8) = 249268274
    crc32Table(9) = 2044508324
    crc32Table(10) = -522852066
    crc32Table(11) = -1747789432
    crc32Table(12) = 162941995
    crc32Table(13) = 2125561021
    crc32Table(14) = -407360249
    crc32Table(15) = -1866523247
    crc32Table(16) = 498536548
    crc32Table(17) = 1789927666
    crc32Table(18) = -205950648
    crc32Table(19) = -2067906082
    crc32Table(20) = 450548861
    crc32Table(21) = 1843258603
    crc32Table(22) = -187386543
    crc32Table(23) = -2083289657
    crc32Table(24) = 325883990
    crc32Table(25) = 1684777152
    crc32Table(26) = -43845254
    crc32Table(27) = -1973040660
    crc32Table(28) = 335633487
    crc32Table(29) = 1661365465
    crc32Table(30) = -99664541
    crc32Table(31) = -1928851979
    crc32Table(32) = 997073096
    crc32Table(33) = 1281953886
    crc32Table(34) = -715111964
    crc32Table(35) = -1570279054
    crc32Table(36) = 1006888145
    crc32Table(37) = 1258607687
    crc32Table(38) = -770865667
    crc32Table(39) = -1526024853
    crc32Table(40) = 901097722
    crc32Table(41) = 1119000684
    crc32Table(42) = -608450090
    crc32Table(43) = -1396901568
    crc32Table(44) = 853044451
    crc32Table(45) = 1172266101
    crc32Table(46) = -589951537
    crc32Table(47) = -1412350631
    crc32Table(48) = 651767980
    crc32Table(49) = 1373503546
    crc32Table(50) = -925412992
    crc32Table(51) = -1076862698
    crc32Table(52) = 565507253
    crc32Table(53) = 1454621731
    crc32Table(54) = -809855591
    crc32Table(55) = -1195530993
    crc32Table(56) = 671266974
    crc32Table(57) = 1594198024
    crc32Table(58) = -972236366
    crc32Table(59) = -1324619484
    crc32Table(60) = 795835527
    crc32Table(61) = 1483230225
    crc32Table(62) = -1050600021
    crc32Table(63) = -1234817731
    crc32Table(64) = 1994146192
    crc32Table(65) = 31158534
    crc32Table(66) = -1731059524
    crc32Table(67) = -271249366
    crc32Table(68) = 1907459465
    crc32Table(69) = 112637215
    crc32Table(70) = -1614814043
    crc32Table(71) = -390540237
    crc32Table(72) = 2013776290
    crc32Table(73) = 251722036
    crc32Table(74) = -1777751922
    crc32Table(75) = -519137256
    crc32Table(76) = 2137656763
    crc32Table(77) = 141376813
    crc32Table(78) = -1855689577
    crc32Table(79) = -429695999
    crc32Table(80) = 1802195444
    crc32Table(81) = 476864866
    crc32Table(82) = -2056965928
    crc32Table(83) = -228458418
    crc32Table(84) = 1812370925
    crc32Table(85) = 453092731
    crc32Table(86) = -2113342271
    crc32Table(87) = -183516073
    crc32Table(88) = 1706088902
    crc32Table(89) = 314042704
    crc32Table(90) = -1950435094
    crc32Table(91) = -54949764
    crc32Table(92) = 1658658271
    crc32Table(93) = 366619977
    crc32Table(94) = -1932296973
    crc32Table(95) = -69972891
    crc32Table(96) = 1303535960
    crc32Table(97) = 984961486
    crc32Table(98) = -1547960204
    crc32Table(99) = -725929758
    crc32Table(100) = 1256170817
    crc32Table(101) = 1037604311
    crc32Table(102) = -1529756563
    crc32Table(103) = -740887301
    crc32Table(104) = 1131014506
    crc32Table(105) = 879679996
    crc32Table(106) = -1385723834
    crc32Table(107) = -631195440
    crc32Table(108) = 1141124467
    crc32Table(109) = 855842277
    crc32Table(110) = -1442165665
    crc32Table(111) = -586318647
    crc32Table(112) = 1342533948
    crc32Table(113) = 654459306
    crc32Table(114) = -1106571248
    crc32Table(115) = -921952122
    crc32Table(116) = 1466479909
    crc32Table(117) = 544179635
    crc32Table(118) = -1184443383
    crc32Table(119) = -832445281
    crc32Table(120) = 1591671054
    crc32Table(121) = 702138776
    crc32Table(122) = -1328506846
    crc32Table(123) = -942167884
    crc32Table(124) = 1504918807
    crc32Table(125) = 783551873
    crc32Table(126) = -1212326853
    crc32Table(127) = -1061524307
    crc32Table(128) = -306674912
    crc32Table(129) = -1698712650
    crc32Table(130) = 62317068
    crc32Table(131) = 1957810842
    crc32Table(132) = -355121351
    crc32Table(133) = -1647151185
    crc32Table(134) = 81470997
    crc32Table(135) = 1943803523
    crc32Table(136) = -480048366
    crc32Table(137) = -1805370492
    crc32Table(138) = 225274430
    crc32Table(139) = 2053790376
    crc32Table(140) = -468791541
    crc32Table(141) = -1828061283
    crc32Table(142) = 167816743
    crc32Table(143) = 2097651377
    crc32Table(144) = -267414716
    crc32Table(145) = -2029476910
    crc32Table(146) = 503444072
    crc32Table(147) = 1762050814
    crc32Table(148) = -144550051
    crc32Table(149) = -2140837941
    crc32Table(150) = 426522225
    crc32Table(151) = 1852507879
    crc32Table(152) = -19653770
    crc32Table(153) = -1982649376
    crc32Table(154) = 282753626
    crc32Table(155) = 1742555852
    crc32Table(156) = -105259153
    crc32Table(157) = -1900089351
    crc32Table(158) = 397917763
    crc32Table(159) = 1622183637
    crc32Table(160) = -690576408
    crc32Table(161) = -1580100738
    crc32Table(162) = 953729732
    crc32Table(163) = 1340076626
    crc32Table(164) = -776247311
    crc32Table(165) = -1497606297
    crc32Table(166) = 1068828381
    crc32Table(167) = 1219638859
    crc32Table(168) = -670225446
    crc32Table(169) = -1358292148
    crc32Table(170) = 906185462
    crc32Table(171) = 1090812512
    crc32Table(172) = -547295293
    crc32Table(173) = -1469587627
    crc32Table(174) = 829329135
    crc32Table(175) = 1181335161
    crc32Table(176) = -882789492
    crc32Table(177) = -1134132454
    crc32Table(178) = 628085408
    crc32Table(179) = 1382605366
    crc32Table(180) = -871598187
    crc32Table(181) = -1156888829
    crc32Table(182) = 570562233
    crc32Table(183) = 1426400815
    crc32Table(184) = -977650754
    crc32Table(185) = -1296233688
    crc32Table(186) = 733239954
    crc32Table(187) = 1555261956
    crc32Table(188) = -1026031705
    crc32Table(189) = -1244606671
    crc32Table(190) = 752459403
    crc32Table(191) = 1541320221
    crc32Table(192) = -1687895376
    crc32Table(193) = -328994266
    crc32Table(194) = 1969922972
    crc32Table(195) = 40735498
    crc32Table(196) = -1677130071
    crc32Table(197) = -351390145
    crc32Table(198) = 1913087877
    crc32Table(199) = 83908371
    crc32Table(200) = -1782625662
    crc32Table(201) = -491226604
    crc32Table(202) = 2075208622
    crc32Table(203) = 213261112
    crc32Table(204) = -1831694693
    crc32Table(205) = -438977011
    crc32Table(206) = 2094854071
    crc32Table(207) = 198958881
    crc32Table(208) = -2032938284
    crc32Table(209) = -237706686
    crc32Table(210) = 1759359992
    crc32Table(211) = 534414190
    crc32Table(212) = -2118248755
    crc32Table(213) = -155638181
    crc32Table(214) = 1873836001
    crc32Table(215) = 414664567
    crc32Table(216) = -2012718362
    crc32Table(217) = -15766928
    crc32Table(218) = 1711684554
    crc32Table(219) = 285281116
    crc32Table(220) = -1889165569
    crc32Table(221) = -127750551
    crc32Table(222) = 1634467795
    crc32Table(223) = 376229701
    crc32Table(224) = -1609899400
    crc32Table(225) = -686959890
    crc32Table(226) = 1308918612
    crc32Table(227) = 956543938
    crc32Table(228) = -1486412191
    crc32Table(229) = -799009033
    crc32Table(230) = 1231636301
    crc32Table(231) = 1047427035
    crc32Table(232) = -1362007478
    crc32Table(233) = -640263460
    crc32Table(234) = 1088359270
    crc32Table(235) = 936918000
    crc32Table(236) = -1447252397
    crc32Table(237) = -558129467
    crc32Table(238) = 1202900863
    crc32Table(239) = 817233897
    crc32Table(240) = -1111625188
    crc32Table(241) = -893730166
    crc32Table(242) = 1404277552
    crc32Table(243) = 615818150
    crc32Table(244) = -1160759803
    crc32Table(245) = -841546093
    crc32Table(246) = 1423857449
    crc32Table(247) = 601450431
    crc32Table(248) = -1285129682
    crc32Table(249) = -1000256840
    crc32Table(250) = 1567103746
    crc32Table(251) = 711928724
    crc32Table(252) = -1274298825
    crc32Table(253) = -1022587231
    crc32Table(254) = 1510334235
    crc32Table(255) = 755167117

   Dim crc32Result As Long
   crc32Result = &HFFFFFFFF
   Dim i As Integer
   Dim iLookup As Integer
   Dim buffer() As Byte
   buffer = StrConv(str, vbFromUnicode)
   For i = LBound(buffer) To UBound(buffer)
      iLookup = (crc32Result And &HFF) Xor buffer(i)
      crc32Result = ((crc32Result And &HFFFFFF00) \ &H100) And 16777215
      ' nasty shr 8 with vb :/
      crc32Result = crc32Result Xor crc32Table(iLookup)
   Next i
   CRC32 = Not (crc32Result)

End Function

Mini-HOWTO: mini_snmpd, a small snmpd for embedded systems like OpenWrt

mini_snmpd is a GPL snmpd by Robert Ernst.

It doesn't include docs, so here are some starter docs. The OpenWrt version can be installed from the package manager, but it doesn't seem to include a startup script for init.d, so I'll try to whip one up here, as well.

First, you'll need command line access to the device, so if it's an OpenWrt router, install the dropbear ssh server, and log in via ssh. SSH in as root.

To see the help for mini_snmpd, use the -h option:

mini_snmpd -h

You should see:

usage: mini_snmpd [options]

-p, --udp-port nnn     set the UDP port to bind to (161)
-P, --tcp-port nnn     set the TCP port to bind to (161)
-c, --community nnn    set the community string (public)
-D, --description nnn  set the system description (empty)
-V, --vendor nnn       set the system vendor (empty)
-L, --location nnn     set the system location (empty)
-C, --contact nnn      set the system contact (empty)
-d, --disks nnn        set the disks to monitor (/)
-i, --interfaces nnn   set the network interfaces to monitor (lo)
-t, --timeout nnn      set the timeout for MIB updates (1 second)
-a, --auth             require authentication (thus SNMP version 2c)
-v, --verbose          verbose syslog messages 
-l, --licensing        print licensing info and exit
-h, --help             print this help and exit

The default values are in parens.

On a router, you usually want to see the traffic going through the network interfaces. Here's a run of mini_snmpd that exposed those values:

mini_snmpd -i eth0.1,wl0,br-lan

Your command line may differ, depending on your router and network configuration. Mine was OpenWrt on a Linksys WRT54g or gl.

Back on your desktop (or whatever computer will be querying for snmp stats), use the net-snmp tools to poll for data. I learned how in this tutorial, Simple SNMP with Linux, by Jason Philbrook.

Run this:

snmpwalk -v 1 -c public

That IP address is my router. -v means version, and -c means community name: version 1, community "public". mini_snmpd seems to ignore the community name, but it must be supplied. The output I got was:

iso. = ""
iso. = OID: iso.
iso. = Timeticks: (412) 0:00:04.12
iso. = ""
iso. = STRING: "OpenWrt"
iso. = ""
iso. = INTEGER: 3
iso. = INTEGER: 1
iso. = INTEGER: 2
iso. = INTEGER: 3
iso. = STRING: "eth0.1"
iso. = STRING: "wl0"
iso. = STRING: "br-lan"
iso. = INTEGER: 1
iso. = INTEGER: 1
iso. = INTEGER: 1
iso. = Counter32: 1758591601
iso. = Counter32: 3436817368
iso. = Counter32: 1913312114
iso. = Counter32: 122466684
iso. = Counter32: 6121670
iso. = Counter32: 110670073
iso. = Counter32: 0
iso. = Counter32: 0
iso. = Counter32: 0
iso. = Counter32: 0
iso. = Counter32: 85
iso. = Counter32: 0
iso. = Counter32: 4073775119
iso. = Counter32: 980016090
iso. = Counter32: 2726650206
iso. = Counter32: 112229579
iso. = Counter32: 6270244
iso. = Counter32: 120445972
iso. = Counter32: 0
iso. = Counter32: 0
iso. = Counter32: 0
iso. = Counter32: 0
iso. = Counter32: 11015
iso. = Counter32: 0
iso. = Timeticks: (748634215) 86 days, 15:32:22.15

The first part is the object ID (OID), the second part is the data type, and the third part (after the colon) is the value. The OID is like a path to a value. What are these values? There's a database for them, and this page will show you the names for the network counters at iso.

That site has a ton of OIDs in it.

Armed with this knowledge, you should be able to program Cacti or MRTG to extract data from your router and graph it.

An init.d for OpenWrt

The old version of OpenWrt I'm using didn't create a script in init.d for me. So here's an init.d script, /etc/init.d/mini_snmpd:

#!/bin/sh /etc/rc.common
# Copyright (C) 2006

start () {
	mini_snmpd -i eth0.1,wl0,br-lan &

stop() {
	killall -9 mini_snmpd

It's based on init.d/cron

You also need to run this:

cd /etc/rc.d/
ln -s /etc/init.d/mini_snmpd S50mini_snmpd
/etc/init.d/mini_snmp start

That creates a symlink to cause mini_snmpd to start on boot. Then it starts the daemon. You can now log out of the router.

Mobile Phone Developer Sites

A couple mobile phone business and development links. One came from TechRepublic, speculating about who might buy (the newly revived) Palm.

Podcast: Will the $99 smartphone trigger a price war? [Guess not. It seems to be a price war at the $199 price point.]

Correcting BREW and J2ME - a 2008 article that gives background about the competing BREW and J2ME markets, and the then-emergent iPhone business model.

Links to misc app stores (mobile or not): Linspire CNR, GetJar, Boost Mobile, ATT MediaMall, Sprint Software Store, Handmark, Ovi (Nokia), Android Market, T-Mobile T-Zones, Motorola Solutions.

A bunch of development links after the jump.

A Random List of Stuff

Maemo from Nokia

Palm Pre

(Google) Android

Apple iPhone

Java ME


(Quakcomm) BREW

Symbian OS

Windows Mobile


Access NetFront

Geos (defunct OS)

Also, there are some higher-level application platforms.

Yahoo Blueprint


Motorola WebUI - one of a few different WebKit based solutions out there.

WebKit, Apple's browser engine, which is getting a lot of application features added.

Ansca Corona, an iPhone SDK that uses the Lua language.

GetJar mobile phone market stats: summary is, 75% goes to MIDP2, CLDC1.1. The rest is mostly Symbian. So Java still dominates, but, iPhone is the emergent platform that is leading innovation.

cellphone.jpg5.42 KB

Move Files into a Directory Named for the Modification Date

This script is being used to move files around in a Maildir. A bunch of spam goes into the "new" directory. When this script is run, it moves the files into directories based on the mtime, into directories named for the date when the file was modified.

#! /usr/bin/perl

# move files into directories named by date
# a file modified on 2009-07-11 will be moved into a directory named "2009-07-11".

$dir = $ARGV[0];

$dir = 'new' if ! $dir;

opendir DH, $dir;

while ($fn = readdir DH)
        next if ($fn =~ /^\./);

        $filename = "$dir/$fn";
        ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat($filename);

        ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($mtime);

        $mday = sprintf('%02d',$mday);
        $mon = sprintf('%02d',++$mon);
        $year = 1900 + $year;
        $destdir = "$year-$mon-$mday";
        $destname = "$destdir/$fn";

        mkdir $destdir if (! -e $destdir);
        rename $filename, $destname;

Moving from Backups to SAN/DRBD + Archival Backup

This article is intended as a think-piece for small offices and LANs.

As data storage needs increase the volume of data will slowly overwhem the ability to back up the data.

BackupExec saving 500 GB of data over a gigabit ethernet link to a RAID NAS (Buffalo terastation running Linux with soft RAID 5 and Samba) takes 15 hours. Even with this miserable performance, I'd be surprised if I could get it to improve its performance by getting a better NAS - I suspect the real bottleneck is BackupExec, which probably does a lot to prepare the backup file.

So, in the real world, I'll be running one-day backups just before I need to backup one terabyte of data.

One possible solution for this is to use a Storage Area Network (SAN). A SAN is an enterprise-level tech that federates multiple file servers to behave like a single, large file server. The system is redundant, so losing a server doesn't destroy the network. SAN is a block level technology, and the network emulates a disk. The main problem with SAN is the high price.

An emergent alternative is DRBD, a redundant network-mirrored file system that runs on Linux and BSD. It creates a block level device that is backed by a file, and that file is synchronized across the network with a remote copy.

In a typical network, the Linux or BSD system is accessed over Samba file sharing, which is a crossplatform file sharing solution that works with Windows. With some management scripts, it should be possible to create a file share that's redundant and has high availability.

This redundant system would be better than RAID. An entire machine could be removed from the system, and it would still be available.

With this higher level of reliability, it would be feasible to reduce the frequency of backups, even to the point of demoting backups to the role of archival backups. Additionally, backups could be performed against one machine, while file access for users is provided on the other machine, improving overall performance.

Data being written to disk is limited by the disk speed, which is 7200 RPM. Even if you have gigabit ethernet and SATA2, you're limited by the speed of the disk. And even if you spend the money for 10,000 RPM disks, you're still not going to overcome this limit in significant ways. If you invest the money for 10gib copper or fiber optic, you're still limited by the 6gbit SATA3.

Real world disk performance is around 60 MB/s, or in mbits, around 500mbit/s. That's around 50% of a gigabit ethernet link, and around 5x as fast as a 100mbit ethernet link. RAID further degrades disk performance on writes.

So, even if you remove the network and disk interface bottlenecks, you're going to be limited to 60Mb/s, and that's before you consider other performance hits related to writing data. The fastest possible backup of one terabyte of data is 4 hours.

Real world performance is significantly worse as noted above.


MySQL Optimization

Here's a noob-to-noob optimization trick. Suppose you have a database table with, say, 200,000 records, and you regularly select on multiple criteria. The rule for selection is to put the most specific WHERE clause first, and the least specific last. The goal is to cut down the search set to something small, and then search through the smaller set. Get all the queries using this order, then create a composite index over the keys to speed up the search even more.

Here are some before and after shots, based on real queries (from sf-active):

select * from tb where display='t' and parent_id=0 and id > 198000 limit 0,30


select * from tb where id > 198000 and parent_id=0 and display='t' limit 0,30

This revision will now cause the first clause to eliminate most of the rows from the table, leaving only around 2,000 rows to scan. The second clause, parent_id, eliminates 50% of the remainder. Display='t' is the least selective clause.

Also, it wasn't noted, but there are already indexes for `display` and `parent_id`. So we aren't starting with absolutely nothing.

select * from tb where display='t' and parent_id=0 limit 0,30


select * from tb where parent_id=0 and display='t' limit 0,30

Also do this:

alter table tb add index (parent_id, display)

That looks virtually identical. Again, this is a real-world situation, where the query was built-up dynamically. The optimization here is that I created an index that will speed up the select. The index matches the order of the query, so the query optimizer will be able to find the optimization easily.

Additionally, it would be a good thing to put all the clauses in all the queries into this order, from most specific to least specific, to gain the maximum optimization. I suspect the query optimizer already does this automatically, but, being meticulous about this seems like good mental discipline.

The real-world effect of this simple optimization, which took around two hours to complete, was dramatic. The slow query had been bogging down the server, with queries taking thousands of seconds to execute (or in our situation, to time-out, and require the admin to go in and kill the thread). Now, the query barely shows up in the process list, and the real-world speed feels like it takes less than five seconds to execute through the web (meaning, it includes dns lookup, tcp connection, and page rendering). Typically, it takes one second, and feels pretty fast.

See Also
Optimizing MySQL - Database Journal
MySQL Optimization - DevShed
Query Optimization

NVU - text fields for copy-and-paste

Here's how to create one of those text fields with HTML that the user's supposed to copy-and-paste into their page. It's not hard.

Create a form.

Add a Text Area. Give it a name, and set the rows and columns.

From the text area dialog, click on the "Advanced Properties..."

Click on the Javascript tab.

Add a property named "onclick" with the value of "this.focus();;".

Click OK.

Click OK.

NOTE: I found a serious problem - NVU's code reformatting will cause the html code to break within myspace, because NVU inserts newlines. To fix the problem, you have to save out the source, join all the lines, and upload the file manually.

NVU Installer for Ubuntu Feisty Fawn

For some reason or other, they don't have NVU for Ubuntu 7. You can install it from the .deb file. Instructions are at:

NVU to Create HTML Code to Insert into Websites

NVU is a free (as in GPL) Hypertext Markup Language (HTML) editor available at

You can use NVU is to generate bits of HTML code that you paste into a web form, like you do to update your profile, or to post an article. This can be done on any site that allows you to post in HTML.

Step by Step

1. Start up NVU.

2. Type in your text, adding any formatting you wish.

3. When you're ready to post, click on the "SOURCE" tab along the bottom of the editing area. You should see a screen with your text, surrounded and interspersed with text that looks like <this>. Those things are "HTML Tags".

Your goal is to extract some tags and text. To do this, first try to find the first <BODY> tag. It should be at around line 8 (the lines are numbered along the left edge of the editing area).

Then locate the closing (final) body tag, around one line before the end of the page. It should look like "</body>". The slash in front indicates it's a closing tag. (The other body tag was the opening tag.)

What you have to do is copy all the code between the body tags, but not including the body tags.

4. Most people click and drag to select the codes, then right-click and copy.

An alternative way is to click before the first character to select, then hold shift and click after the last character to select. Then, press control-C to copy.

5. Paste the code in your clipboard into the web form.


The best way to incorporate images into your posts is to first, prepare your images, and upload them to a server. This can be an image server, or your own server, or an image-host. (On Indymedia sites, you can see if they let you upload images in a batch, then post the story afterward.)

By putting them on a server first, you are giving each image a permanent, public location. That location is the image's URL. Then, you can include that image in the HTML. That's how HTML works - images are references to the image files.

Please note that this is completely different from MS Word. In Word and other word processors, images are included into the document. The image data becomes part of the document, and the image is a copy of the original file. This is a subtle but important difference. Please unlearn any preconceptions you might have learned from MS Word. Also, don't use MS Word to create your HTML. NVU is a better tool.

So, let's assume you've put the images onto a server somewhere, and know the URLs.

To place the image in your document:

1. Click where you want the image to be, and click the Image icon in the toolbar.

2. Type in the URL to the image in the Location: field.

If you are lazy like me, you just open up the image in your browser, then copy the URL from the address bar, and paste it into the Location field.

3. Type in a description into the Alternative Text box, or click "Don't use alternate text."

4. Click OK.

Net Art, Electronic Art, Computer Art, Web Browser Art, & Hybrids

This is a list of various art projects. People put their portfolios online. I apologize to anyone not listed, as this list's been started in 2009, and there's been a lot of interesting stuff going on for a couple decades. This is a personal list, not a comprehensive list.
Plagiarist (Amy Alexander)
The Sickly Season

Other Links
Rhizome Art Base

Operations on a Remote MySQL Server

A demo of how to incorporate SSH tunnels into a Python system administration script.

Like all sysadmins, I write scripts to automate routine operations. Lately, though, I have needed to write scripts that automate routine operations on a remote system, and we need the security barriers to be a little higher than in the "old days".

We're accessing our database through an SSH tunnel, rather than via a regular encrypted socket. (The SSH connection will eventually require key pairs, and disallow regular passwords.)

If you don't know what SSH tunnels are, there's an explanation below.

So, I need to create scripts that will automatically log in to the server, open a tunnel, connect to the database server through this tunnel, and then execute SQL statements. It turns out to be a little difficult... but after some effort, the following script did what I needed:

#! /usr/bin/python
import subprocess as sp
import MySQLdb
import traceback
import sys
from nbstreamreader import NonBlockingStreamReader as NBSR
import os
import signal

    print "Connecting to"
    ssh_process = sp.Popen(['ssh','-L3308:localhost:3306',''], 
        bufsize=0, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.STDOUT )
except ValueError, OSError:

nbsr = NBSR( ssh_process.stdout )

# delay until we're really logged in
while ssh_process.poll()==None:
    output = nbsr.readline(0.5)
    if output:
        # print output.strip()
        # should probably run this before we try to start another one
        if (output.find('bind: Address already in use') != -1):
            print "Critical error, cannot bind to the address."
            print "Killing the errant process.  Please run this script again."
            (pid,err) = sp.communicate()
            pid = int(pid)
            os.kill(pid, signal.SIGQUIT)
        if (output.find('Welcome to Ubuntu') != -1):
            print "SSH connection established."

    print "Connecting to database"
    db = MySQLdb.connect( 
        db='test_schema' )
except MySQLdb.Error as e:

    print "Sending query."
    q = """SELECT name FROM test_table"""
    cur = db.cursor()
    cur.execute( q )
    name = cur.fetchone()
    print name[0]
except MySQLdb.Error as e:

print "Completed."

The nonblocking strream reader is at:

Note that this is test code. It's not production code. The passwords and other information should be in configuration files, not in the code.

Next step is to turn this into a decorator, so we can create the function to perform the database operations, and wrap it with code that will transform it to execute the operations remotely.

(It's also possible to do the encryption on the MySQL server's socket - and require that specific certificates are provided. I'm not certain if one is better than the other.)

SSH tunnels

SSH has a feature where it can forward a local port to a specific port on the remote machine, creating an encrypted tunnel for your traffic. This is done with the -L option. The following forwards port 3308 on the local machine to port 3306 on the remote machine; 3306 is what MySQL runs on:

ssh -L3308:localhost:3306

SSH manages this connection, and when you log out of the remote machine, the tunnel is also taken down. What's nice about this is, you don't have a socket permanently open. It's only available when you're logged on. You can also tunnel anything, so unencrypted services available only on the server can be used remotely. It's like a temporary VPN.

Here's a diagram showing SSH and SSH with a tunnel.

The script above uses the subprocess library to execute ssh, and build the tunnel.

    ssh_process = sp.Popen(['ssh','-L3308:localhost:3306',''], 
        bufsize=0, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.STDOUT )
SSHPortForwarding.png64.23 KB

Outlook: Keeps Asking for Password

Here's one possible reason:

Here's a fix:

Basically, you log in, quit, then log in again. Repeat once more. I'm not sure what's going on.

PC Hardware Failures

I'm noticing some patterns in PC failure. Here they are.

Batteries fail, causing date errors, or worse, booting problems. These fail after 2-3 years, and can be replaced easily for around $10.

Hard drives fail after around 5 years, causing much pain. Laptop drives can fail after just a year, and tend to develop bad sectors due to mishaps with the laptop. For maximum happiness, replace the drives before they fail, and use the originals as archive drives.

Motherboards sometimes fail, but not on any predictable schedule.

Motherboards can fail if the capacitors dry out or start to bulge and explode. This is more common than it should be, but at the same time, all caps tend to fail after years of use.

(I've also seen small ethernet switches fail due to bad capacitors.)

Floppy drives occasionally fail, but, more often, the floppies fail. They last around 3 years, and then some stop working.

Power supplies fail. In PCs, the power supply is often the culprit when a computer doesn't work. Good PSs last for many years, but the stock ones often fail after around 3-5 years. End users can replace these.

The small switched power adapters used with hard drives and laptops fail, a lot. Most of the time, the problem is that the cable is bent and the wires within are broken. More intensive use, like in a server room, tends to lead to the adapters expiring from overwork. The real fix is to buy gear that has a big power supply with a fan.

Fans fail. These things spin until they start to rattle. They're cheap and easy to replace.

Monitors fail, but it's usually the power supply that goes out. If it's not that, then, the it's a goner.

Mice clog up. Use laser mice.

Keyboards get crumbs in them, and they lose keys. Some have intermittent electrical problems that lead to weird typing problems. These can be cleared up by disconnecting and reconnecting the ribbon cable connecting the keys to the controller.

Parity in Computer Data, What is It?

Parity, in computer data, is a bit that's set or unset so the total number of bits is either even or odd. It's an extra bit, and it's added as a check on the data. So, if the parity is not correct, you assume the data is bad.

It's often used in data communications, and was a very visible feature during the old modem and BBS days.

Even parity means that a bit is added so the total number of "1" bits is even. Odd parity means that a bit is added so the total number of "1" bits is odd. So this:


With even parity, is: 10101001
With odd parity is: 10101000

Back in the old days, ASCII had only 7 bits. (Indeed, it actually has only 7 bits, but the 256 and larger character sets have dominated.) The 8th bit was used for parity.

Then, by the 1970s, modems with 8 data bits and a 9th parity bit were common.

The shorthand terminology that describes the number of parity and data bits, as well as stop bits, is still pretty common. A stop bit is an added bit that's low. It's like a pause. Here are some examples:

8N1 - 8 data bits, no parity, 1 stop bit.
7E1 - 7 data bits, even parity, 1 stop bit.
8O1 - 7 data bits, odd parity, 1 stop bit.

Parity bits are overhead, but help detect problems with the data. Of course, if the parity bit is also flipped, or two bits are flipped, the error won't be detected. That, however, is rare, because noisy connections tend to have a lot of errors.

Also, for larger file transfers, techniques like XMODEM and Kermit were invented. They would send the data, and then calculate a checksum. If the checksum failed, then the entire block of data was bad and a resend could be requested.

Ethernet uses parity, as well as checksums.

RAID 5 uses parity, but in a different way. It uses any number of data bits, and one parity bit.

RAID 6 is like RAID 5, but adds another parity bit. This way, you can lose two drives and still recover.

Parity is sometimes misspelled "parody" by people who have poor spelling. Computer people may do the opposite, and spell parody as "parity" because they don't see the word "parody" anywhere. The two words are pronounced similarly.

Payment Card Industry Data Security Standard (PCI DSS), getting with the program.

These are notes for achieving conformance with PCI DSS. PCI DSS is a bit of private-market bureacracy that basically amounts to an agreement to use secure practices, and to implement a system with security enabled, and unsecure services and features disabled. The website was heavy on bureacracy and the technical info was hard to find. First, you need to get the PCI DSS standard, v.2.0. It's a PDF download. Next, get nmap on your server and your desktop. You have to scan the server over and over. With nmap, do this:
nmap -A -T4
You'll get output like this:
Nmap scan report for (
Host is up (0.072s latency).
rDNS record for
Not shown: 927 filtered ports, 64 closed ports
80/tcp   open  http     Apache httpd 2.2.17 ((FreeBSD) mod_ssl/2.2.17 OpenSSL/1.0.0d PHP/5.3.6 with Suhosin-Patch)
|_html-title: 403 Forbidden
110/tcp  open  pop3     Courier pop3d
|_pop3-capabilities: USER STLS IMPLEMENTATION(Courier Mail Server) UIDL PIPELINING LOGIN-DELAY(10) TOP OK(K Here s what I can do)
143/tcp  open  imap     Courier Imapd (released 2011)
443/tcp  open  ssl/http Apache httpd 2.2.17 ((FreeBSD) mod_ssl/2.2.17 OpenSSL/1.0.0d PHP/5.3.6 with Suhosin-Patch)
|_sslv2: server still supports SSLv2
|_html-title: Site doesn't have a title (text/html; charset=iso-8859-1).
465/tcp  open  ssl/smtp qmail smtpd
|_sslv2: server still supports SSLv2
|_HELP qmail home page:
993/tcp  open  ssl/imap Courier Imapd (released 2011)
995/tcp  open  ssl/pop3 Courier pop3d
|_pop3-capabilities: USER IMPLEMENTATION(Courier Mail Server) UIDL PIPELINING OK(K Here s what I can do) TOP LOGIN-DELAY(10)
8000/tcp open  http     Icecast streaming media server
|_html-title: Icecast Streaming Media Server
Service Info: OSs: Unix, FreeBSD
My first goal is to get rid of the SSLv2 warning. Some websites said this was a PCI violation. To do this, first read the mod_ssl docs. Then, you need to alter the configuration file a bit. My file was /usr/local/etc/apache22/extras/httpd-ssl.conf. I added this line to the global config:
SSLProtocol ALL -SSLv2
That enables all but the SSLv2 protocol, which is the oldest protocol and is considered insecure. The newer ones are SSLv3 and TLSv1. Also, alter the ciphers. Look for the line SSLCipherSuite line and change to:
I'm not sure I have that right, but it's mostly about enabling TLSv1, and disabling the LOW and MEDIUM grade ciphers. "TLSv1" above is an alias for a number of different ciphers. See the SSLCipherSuite section in the mod_ssl docs for more information -- it's too complex to describe here. But, in short, negotiating an SSL connection involves several phases, and in each phase, you can use different ciphers. Some are considered stronger than others. Exchanging data with these ciphers requires that both the client and the server have the required programs to handle the ciphers. That's why there are choices -- the programs will try to work with what they've got, and also try to use the most secure ciphers. Your job is to disable the less secure protocol, SSLv2, and not include the less secure ciphers. Read the mod_ssl docs for more details and info on how to list available ciphers. Next, you have to establish a new virtual server for the web store. This requires creating a new Apache conf file, using this default file as a template. The main thing about making an SSL site is getting those certificates, putting them in a safe place, setting the permissions, and getting the server to come up. Just for starters, get a certificate from or make a self-signed certificate. You can "upgrade" to a commercial certificate after you've configured the server correctly. But, before you can do that, you need to allocate an IP address for the website. This is a limitation of Apache and OpenSSL, at this time. Until recently, there was no way to run name-based virtual hosts with SSL; the problem was that SSL was negotiated before the hostname was sent to the server, so you could only have one certificate per IP address. Today, there's a feature called server name identification (SNI) that allows it. Read about gnutls and SNI and Apache with SNI. Also read Wikipedia on SNI - it indicates that any verision of IE on Windows XP does not support SNI. Therefore we can't use SNI on the server. We must use IP addresses for vhosting. Lock down the default virtual host. (I'm not sure if it complies with the export laws as stated in the agreement, but it probably does.)

Perl Watchdog Script for Apache

This is a rough watchdog script to restart apache on the local machine when the website gets slow. If a GET to the url fails, or takes longer than 60 seconds, the local web server is restarted.

I started to use this after installing a new version of Apache. The system hadn't been properly tuned, and the side effect was that Apache would be nearly wedged, but the rest of the system was merely slow. This happens when Apache or MySQL are getting wedged, but it's not due to the system overloading with too much traffic. Why does it happen? It's hard to say - it could be a configuration problem, a DOS attack, a hack, or a software error. Regardless, it's more important to keep the system responsive, so the service gets restarted.

This script needs some more logging so it can snapshot different system stats, like system load, memory, network connections, processes, etc. before it's really useful for figuring out why the server is unresponsive.

#! /usr/bin/perl

our $LOGFILE = '/var/log/watchapache.log';
our $URL = '';
our $APACHE_RESTART = '/etc/init.d/apache22 restart';
our $SLEEP = 600;


require WWW::Mechanize;
require Time::Progress;
require POSIX;

our $mech = WWW::Mechanize->new( onerror => \&failed );

our $p = new Time::Progress;

sub test_server() {
  $mech->get( $URL );
  my $elapsed = $p->elapsed;
  write_log( $mech->status() . " elapsed $elapsed");
  if ($elapsed > 60) {

sub failed() {
  write_log( "Failed " . $mech->status() );
  if ($mech->status() eq '500') {
    ## assume it's a dead server

sub restart_apache() {
  write_log("restarting apache");
  system( $APACHE_RESTART );

sub write_log($) {
  my $line = shift @_;
  if (-e $LOGFILE) {
    open FH, '>>', $LOGFILE;;
  } else {
    open FH, '>', $LOGFILE;
  print FH POSIX::strftime('%D %T', localtime);
  print FH " $line\n";
  close FH;

for(;;) {
  sleep( $SLEEP );

Phone Number to Call that Repeats the Line's Number to You

The telephone number that will tell you the number you're calling from is called an Automatic Number Announcement Circuit, or ANAC.

Listing of ANACs.

They're useful if you're trying to identify the number associated with a dial tone.

It's important to label ALL the jacks correctly, and use a scheme like A1 for analog phones, D1 for digital phones, and L1 for LAN (Ethernet) jacks. When you're testing jacks, start off by plugging in the highest-voltage, highest current device first. That's usually the PBX phones, which use 2 wires, 48V and enough current to power the chips inside.

Then, after that, it's the regular POTS telephones, which ring at 90+ volts, get powered at 48V, and is < 10V when it's off hook.

Last is Ethernet, which runs between 2V and 5V. If you're testing Ethernet with a computer, consider using an external adapter which you can sacrifice.

Generally, there's no hazard plugging a POTS phone into a PBX, but you might hear weird noises.

If you see a "harmonica", it's probably plugged into a RJ21, 50-pin "centronics" type plug. These will do 25 phones, at 2 wires per phone, or 12 with 4 wires per phone. These terminate at the "harmonica" or sometimes at a patch panel.

Printer Purchasing Advice

Computer printers suck. It's almost impossible to tell if you're going to get a good one, or a big dud. Generally, the good ones are expensive, and the losers are cheap. Some brands are better than others, but the models within a brand vary more than the models across a brand. There are good Brothers, and there are crappy HPs, even though people generally think of the Brother as inferior to HP.

Best bet is to buy at the midrange, for products aimed at small offices. Products at the low end aimed at the home market won't last. Also, there are sometimes some great bargains - but that could be because of design flaws.

Example: Samsung ML-2510. Works great when new, but it seems like a design flaw cause the unit to get hot, and the rubber parts to wear out quickly, necessitating cleaning and possible refurb.

On the other hand, I had a small Panasonic laser printer that lasted for years and printed thousands of pages. I eventually gave it away because it was a Mac printer and I went to the PC and Linux. The print quality was middling, and speed was slow, but the machinery was solid.

All new printers work great. Not all printers work after three years of steady use, but some will.

The old HP 4000 series and the 4 and 5 series printers were and are awesome. They easily last a decade, and require only one or two roller repairs in that time. These were office-grade printers with few features, and priced in the $800 range. You can find them for < $100 now. Only problem is that they require parallel printer ports or an ethernet card.

HP quality is variable - it's kind of like cars. Some years, they're good, other years, not so goo. Midrange HPs sell on the used market for $200 - $400 and are easy to evaluate - read the reviews.

Epson and Canon are considered the standard inkjets. I don't like the lower-end models of either. The bottom office inkjets are okay, but the ones people want are the midrange ones that cost around $400.

Color laser printers don't look good, but are impressive when you hand out materials and there's a splash of color. The cost of toner is quite high, but less expensive than ink. Watch out for low-end printers: the toner is very expensive.

ProFTPd MySQL configuration tips

Setting up ProFTPd with MySQL isn't "hard" per-se, but the most popular tutorial at Khoosys is kind of complex. has users, groups, quotas, and a lot of accounting.  So the tables are numerous and there are a lot of queries involved.

I ended up stripping out the groups and all the quota features, and went with the users and a little accounting.  I can't share the details here, but, you basically turn features off by reading the sample config file, and then start deleting unneeded lines.  You can figure out which lines here: configuration had PAM enabled, and it caused some problems.  So I disabled PAM, at least for now.  It needs to be reconfigured to work with the FTP accounts. the configuration, it was difficult to debug.  The way to debug is to stop the server (via /usr/local/etc/rc.d/ stop), and then run the server at the console, with debugging.

system# proftpd -n -d 2

That will cause the server to start, but instead of going into the background, it will send error messages to the console. Start up the server, and connect to it via FTP. You'll see it succeed, or fail with nice error messages. When you get an error, kill the process, fix the bug, and restart.Once the configuration is done, you can simply start the server the normal way (via /usr/local/etc/rc.d/ start).

Python Cheatsheet

When I shift languages, I sometimes make cheatsheets to speed up the transition to the new keywords and syntax. There's only so much shelf space in the brain. This is one I made for Python.

python.txt3.68 KB

Qmail vs. Exim4 for Spamming

I had to send out some mass emails, and because the site had been taken off the collective server, the only installed, configured copy of PHPList I had was on an internal server.

This server was configured, mainly to filter email and forward it to the main mail server. It was using Exim4.

The old server was (and is) using Qmail.

Exim seems to have a feature that causes mail to a domain to not be retried too often. It lets them accumulate, and then sends a few through in batches. This was a problem, because it caused the big domains like to get floods of email.

Qmail seems to treat each message individually (more or less) and there's less bunching-up of messages. This was what I wanted, because I configured PHPList to trickle the email out fairly slowly.

Also, what Exim deems to be a failed message is difficult to ascertain. It seems like sending a message to a nonexistent mailbox will cause that domain's email to be held up. I'll have to double-check on that sometime.

RAID 5 Parity. What is it, and how does it work?

One morning, I started wondering how RAID 5 parity works to rebuild a disk array. It seemed "magical" to me, that you can get redundancy and still use most of your disk capacity. So I searched for it... and turned up not very much info, and one other person's unanswered question. A few articles explained it, but in a little more detail about performance, and less detail about the actual parity function. So that's why this page exists. The good articles were at:

MS TechNet
Tom's Hardware

What's the magic?

The short answer is "XOR". XOR is a binary operator that takes two inputs and produces one output. The rules are:

1 xor 1 = 0
1 xor 0 = 1
0 xor 1 = 1
0 xor 0 = 0

In English xor means "if there's a difference, the result is 1".

Parity in RAID 5 involves reserving some space for parity information. Parity data is an additional digit of information that helps you recover lost data.

Another way to describe this parity is "even parity". That means we try to keep the number of "1" bits even. If there are 2 "1"s, the parity is "0". If there is only 1 "1", the parity is "1". In short:

1 1 parity 0
1 0 parity 1
0 1 parity 1
0 0 parity 0

(I had a little difficulty "getting it" because I was used to parity calculations on a network. If you're familiar with that, don't apply that knowledge too much. RAID parity is just a little bit different.)

How parity data is arranged on the disk

In practial terms, the data on the disk is stored in cylinders. A cylinder is all the data that passes under the disk head in one revolution. Cylinders are grouped into stripes, which are the same-numbered cylinder across all the drives. So, stripe 1 is the group of all the cylinder 1s across all the disks. (Note that this is the idealized RAID. Actual RAID is less tidy, but the stripe is the basic set of data that is protected.)

RAID calculates parity across cylinders, within a stripe. So if you have a three-disk RAID 5 array, and your data is on stripe 0, two of the cylinders hold data, and the third cylinder holds the parity.

Parity is calculated across the cylinders. (It's not calculated within a single byte, the way it is on networks.) So if cylinder 1 on disk 1 looks like this:


And cylinder 1 on disk 2 looks like this:


Then the parity looks like this:


Here are the three cylinders again, but closer together so it makes sense:


Notice how there are an even number of 1s in each column of bits. The parity is even.

That is how RAID can recover from a lost hard disk. If you replace a disk, you can rebuild it because you know that there should always be an even number of bits in each column.

If you lost the second disk, your data suddenly looks like this:


You can rebuild the second disk by setting the 1 and 0 bits based on our parity rule, that there should be an even number of 1s. Here's an example of regenerating the first four bits:


All we do is repeat this calculation several billion times, and our data is rebuilt.

(A nice feature about RAID 5 is that, as you add more disks, you still retain only one parity cylinder per stripe, so, the fraction of space used for parity decreases. A 4-disk RAID array allows you to use 3/4 of the array for data, and 1/4 for parity.)

The main downside of RAID 5 is performance, and that issue is described in the linked articles, above.

Related Article
RAID 5 is less reliable than RAID 1, and it gets worse when you add more disks.

Random Integers (J2ME)

So, I'm studying J2ME, and for some reason (maybe the wrong version of CLDC?) I can't use random.getNext(n). I can't specify the range of the random number. What a pain.

I wanted to avoid doing floating point math, and fell back on a C trick. To get a random number from 0 to 500:

import java.util.Random;
Random r = new Random();
int myNum;
myNum = ((r.getNext() & 0xffff) * 500 ) >> 16;

r.getNext() returns a random int, which is 32 bits.

& 0xffff masks off the upper 16 bits, leaving 16 lower bits of randomness.

So our range of random numbers is 0 to 65535. We multiply this by our desired range.

>> 16 shifts the bits to the right. >> 16 is equivalent to dividing by 65535.

So, what this calculates is int((random_smallint / 65536) * range).

The trick is that we don't do any divisions, and we only multiply once, so it's probably faster.

Picture 005.jpg14.66 KB

Reading Code with GVIM/VIM and ctags

Vim/GVim has great features to make it easy to read C (and PHP and Perl) code. They treat your code like a hypertext or web page, where you can jump between files easily.

First is the "go to file" feature. If you move the cursor onto a file name, and type gf, you'll be taken to that file. To return to the previous file, press Control-o. The file opens are kept on a stack, so you can drill down into file, and climb back out. This is useful for reading code with "includes".

gfgo to ffile under the cursor, and push the current file onto the file stack
Control-oPop the file from the file stack and return to it.
Second is the "tags" feature. If you have a body of code written in C, you can use the "ctags" command to generate an index of function definitions. This is stored in a file named "tags" in the root directory.

On Debian or Ubuntu, to install ctags, do: sudo apt-get install exuberant-ctags

When you move the cursor onto a word, and press Control-], it looks up the word in tags, and opens up the file with the definition, and puts the cursor at the definition. The tags are kept on a stack, and you can back out of the file by pressing Control-t. This is useful if you're reading a header file or some code, and want to know what a function does.

Sometimes, a tag will have multiple definitions. You usually discover this when you use C-] and go to a useless line. You can find an alternative tag definition by typing :ts and picking the correct tag definition.

Alternatively, you can type g] and get a list of the tags defined for the word under the cursor. Use this if you're aware that the tag has multiple definitions.

Control-]Follow tag. Looks for the word under the cursor in the tags database, and open the file where the tag is defined. Pushes current file onto the tag stack.
Control-tPop file from tag stack and open it.
:tsShow and select tag definitions for the word on the top of the tag stack. Use this if you follow a tag and the definition seems wrong.
g]Show and select tag definitions for the word under the cursor.
The exuberant-ctags package (on Debian) also known as etags, is a ctags that works for PHP, Perl, Java and a lot of other languages.

When you install exuberant-ctags it will install two commands ctags and etags. Calling it at ctags will produce a "tags" file in the classic format, which is used by vim. Calling it as etags will produce a "TAGS" file using a newer binary format used by emacs. So, use "ctags".

Here are some sample commands:

ctags -R
Recursively build the tags file from all the sources starting with the current directory.
ctags -R *
Run this in the Drupal root directory to build the tags file. Drupal's code uses the .inc suffix on files in the includes directory.

Generally, I'll create a script to refresh the tags file, and make sure ALL the files I'm using are included in the database. As the project grows, I re-run the script to rebuild the db. Simple.

Here's a script to rebuild the tags file for Drupal:

#! /bin/bash
cd ~/Src/drupal-6.19/
ctags -R *

Recreate a Dropped Table in Django Migrations

This is a somewhat embarrassing story, but one that's common enough that you can find it online: I dropped a table, and needed to recreate it, and I'm using those (grr) Django migrations.

The right way out of this mistake is to recover from a backup. But let's suppose I didn't have a recent backup of my development database... because I didn't.

So, here's the general technique:

Let's say you are stuck at a migration, and can't move forward. Migrate to one you can reach. Example: migrate appname 0018

Delete all the migrations after that.

Grep for your tablename in the migrations directory:

grep Report *

You will see where it's been created, and altered. You will need to edit these migrations so that the initial migration that created it matches your model.

You will need to try and edit the migrations afterwards, too - but this isn't strictly necessary in my experience... so don't bother at first.

Copy the migration that created the table (with your edits) to a new migration:


Edit the new file to delete everything except the migration to create your model.

Edit the "dependency" list so that you create a dependency on the previous migration. If migration 0018 is named "", then the dependencies look like this (my appname is v1):

    dependencies = [
        ('v1', '0018_auto_20150411_0205'),

A fake migration file is attached. It's not the one I used, but it's an example of the syntax.

You can then run this migration: migrate appname 0019

Then, you need to delete this migration, and do a makemigration: makemigrations

That will combine all the new model changes, and create the migration. Apply the migration: migrate

If it fails (and mine did) look at the error message and make adjustments. You may have not altered the initial migration file to match the model, completely. If that's the case, alter the initial migration that created the table until it matches. Try migrating again.

At this point, it becomes a little tricky - you may need to delete the intermediate migrations that altered this table, too. It's even scarier than it sounds, but you should feel pretty comfortable looking at the migration file by now.

In the end, your code will be slightly altered so it appears as if today's model was initalized way back in the past.

One other thing - if you were having problems migrating in the first place, your database tables were probably out of whack. What I ended up doing was putting DROP TABLE and ALTER TABLE statements into a pane in MySQL Workbench, and running them to get a fresh start.

I wish I could end this with an encouraging paragraph about how great migrations are... but that's not how I'm feeling. I think that the Django style of centralizing all the changes in the code is kind of ass backwards. The database should be the central repository of truth about the data. Generating code that can alter a table to match a Django model sounds dangerous, because programmers are a lot quicker to change a model than to change a database table. A small error in code can turn into a migration that alters the data.

I think a smarter way to migrate is by creating migration scripts by examining table schemas, and generating SQL code to perform the changes.

Am I going to do that? NO. I'm using a framework to build an app. The app is the goal, not writing an extension to this feature-rich framework. It's not worth the effort to diverge from the system that's already in place, and which works adequately most of the time. If something goes wrong... I'll search for a solution and find this page, and follow the instructions :)

0019_fake.py_.txt1.13 KB

Removing Outlook Express from Windows

If you have a Win2k or WinXP install, you might consider removing Outlook Express, because it's a big security hole. Unfortunately, MS decided you really need MSOE, and attempting to delete the exe file will cause the exe file to reappear.

Here's a lengthy how-to about disabling msoe.

Newsflash: I feel stupid now. The real way to remove it is to go into Add/Remove Programs, click on the Configure Components icon, and then uncheck the OE option. You can also disable Windows Messenger and MSN Explorer.

Removing Rows from a Table with Access, for JBlast

The goal is to remove all the bad fax numbers from a JBlast fax list. This applies to any situation where you want to remove one list of data from another list of data. Another way to say it is that you have a full list, and you're trying to remove a sublist from the full list.

The way you do it is by taking the full list, and then doing a JOIN that will add a column that identifies the rows that are present in the sublist. You can do this with a LEFT JOIN. A LEFT JOIN includes all the rows in the left table, in this case, the full list. It matches on a key in the right table, and when there's a match, columns from the right table are included; when there's no match, the columns are set to NULL.

The following image shows a left join in Access.

The left join is indicated by the arrow connecting the two tables. To create a left join, you drag column names to create a relationship line, then right click on the line and edit it.

The SQL is (sorta): SELECT name,fax,error FROM full_list LEFT JOIN sublist ON full_list.fax=sublist.fax WHERE error IS NULL OR error='Line Busy'

To create a new list that includes only the rows NOT in the sublist, you filter so that you bring in NULLs. In this specific example, we're looking for NULLs and "Busy" numbers. Everything else is considered a bad number.

The output of this query can be saved out as a new list.

(Correction: For some reason, the raw undelivered.csv file has a space in front of each phone number. I had to create an extra query to remove it. I'll post the fix later.)

leftjoinquery.png5.62 KB

Research on Impulse Buying and Ecommerce

This is a summary of articles found during cursory Internet searches regarding impulse buying and ecommerce.

A more attractive printable version is posted here at Docstoc.

Most of these are “newbie” articles because I lack marketing experience. This list has been filtered in based on quality. Star ratings ranging from 1 to 3 indicate perceived quality.

Most articles found were user-generated content, and focused on the topic of “how to avoid impulse buying.” They severely outnumbered explanations of how to induce impulsive shopping, so it appears that the retailers are winning, and the consumers feel relatively unable to resist. Only one of this type of is included as an example.

Impulse buying is defined as anything from a purchase that wasn't planned, to a purchase that someone felt compelled to buy, but felt regret or guilt about later. Impulse buying accounts for a significant fraction of sales, both in brick-and-mortar and online retail, ranging from around 20% to 40% of all purchases.

Impulse buying in brick-and-mortar stores is different from impulse buying online, as expected, but what they both have in common is “browsing”. Impulse purchases are made when a product is displayed to the shopper. While it is obvious that a shopper won't buy something he cannot see, both traditional and online stores often fail to design the floorplan or web-page to let the customer see more products as they look for what's on their shopping list. The “take away” point is that browsing environments must be created, planned, and maintained to encourage impulse buys.

Additional factors affect impulse shopping, but ecommerce web-sites are still relatively primitive, and they aren't all browseable yet.

Articles summaries and a list of articles follows.

How to Prevent Online Impulse Shopping *

This article was kind of lame. How do you avoid using credit cards online? Also, online is actually not that good for impulse buys because there's no “there” there. I include it here only to give an example of a typical article trying to help people deal with impulsive shopping.

My tips for reducing online impulse shopping.

The Impulse Buy ***

Everyone feels guilty after an impulse buy, but people who are less likely to buy impulsively feel guilty about it twice as long. Impulse buyers felt guilt, but by the next day, felt good. The more prudent buyers were more likely to follow up their impulse with a practical decision. This decision is not conscious.

How Retailers Lure You to Shop and Buy **

Describes some psychological things stores do to lull you into shopping: music, aroma, entrance, flow (a maze of products). Refers to Paco Underhill's Why We Buy.

Why People Buy Green ***

“Light green” shoppers, who make up 89% of the market, buy green items out of curiosity. “Dark green” make up only 9%, and do research before purchasing. 39% of light greens made the purchase decision at the store, versus 20% of dark greens. 15% of light greens were motivated by learning about the product, versus 29% of dark greens. (The writing in the article is convoluted.)

Do You Buy on Impulse? **

40% of video games buyers purchased a game on impulse in the past six months. Impulse buys are purchased more often by the younger and older buyers, and they pay $27 for the game (compared to $42 for planned purchases). 43% of buyers said they paid $10 to $20 for the game. Lower prices induce impulse shopping.

How to Use Impulse Buying Behavior to Boost Your Bottom Line ***

This article seems to be good. It's by Chintan Bharwada, author of the Loyalty & Customers blog.

Impulse Toy Purchases *

This is a funny article worth reading for entertainment value. He says, “they play on our parental insecurities and they know that cost is not going to prevent us from purchasing our children’s happiness...” I suspect what they also get parents to purchase toys that they would want for themselves.

3 “Impulse Buy” Tactics for Membership Websites to Use in the Holiday Season *

12 Factors of Impulse – Natural Powers to Ignite Sales **

Status, curiosity, sense of urgency, fear of loss, sympathy factor (helping a cause), indifference factor (act indifferent), greed factor (competitive shopping), desire to gain, superiority factor, obligatory factor, limiting factor (limit number to purchase and people will buy more), “now factor” or getting things delivered quickly.

Brain-based triggers of impulse buying **

A neurological analysis of an impulse buy. Factors: loss aversion, immediate rewards, pleasure, diminished of willpower, beating scarcity and gaining the stamp of approval. Even if you understand the factors, you are still subject to their effects.

Impulse Buying Report **

A student group report from Pakistan, where private consumption expenditures have grown by 7.4% per year. (The average income during this report was only $925 per year per person.) Classifies types of impulse purchases. Notes effect of transaction size and shopping lists. Limited sample size, but results: > 20% impulse buy, women make more impulse purchases, larger shopping bill correlates with less impulse purchases. Snacks and frozen foods were popular impulse purchases.

Impulse purchase and e-commerce – Online Consumer Behaviors ***

This is an older paper from 2001 or 2002. 40% of online purchases are unplanned. 75% of buyers stated that the purchase was price-driven. Analysis uses the Consumption Impulse Formation and Enactment model (CIFE) by Utpal M. Daholakia, a model to understand impulse purchases. To create the ideal environment: category links, simple checkout, recommendation system, virtual checkout, product exposure, highlight feature products, bundles.

On the Negative Effects of E-Commerce: A Sociocognitive Exploration of Unregulated On-line Buying ***

An older paper from around 2001 by Dr. Robert LaRose, who specializes in media and telecommunications. This doesn't discuss ecommerce as much as psychological factors like addiction, compulsive behavior, and shopping. The ecommerce part seems dated.

See also: Media Now: Understading Media, Culture, and Technology

What causes customers to buy on impulse? **

2002 paper by User Interface Engineering. I think it's results are distorted by the fact the participants were given money. Shoppers think of things to buy as they shop. 87% of money spent on impulse purchases resulted from category navigation. The other 13% were from using search. Site searches narrow focus too much. Well designed navigation exposes customers to more products, resulting in more impulse buys. 2% conversion rate * has achieved a 2% conversion rate (meaning that 2% of people who click to that site via a paid ad purchase something). Industry average is 1%.

Some other pages and papers I haven't had time to read and summarize (this looks difficult, but has lots of data) (a presentation)

Resolve IP Addresses to DNS Names

Sometimes, you have textual data, like log files, with IP addresses. You sometimes want this data to show hostnames instead.

This script converts IP addresses in the standard input to hostnames. (Script is based on one I found in

#!/usr/bin/perl -w
# Resolve IP addresses in web logs.
# Diego Zamboni, Feb 7, 2000
# John Kawakami, May 12, 2008

use Socket;

# Local domain mame
$localdomain = '';

while (my $l = <>) {
  if ($l =~ /^(.*?)(\d+\.\d+\.\d+\.\d+)(.*?)$/) {
    $pre = $1;
    $address = $2;
    $post = $3;
    if ($cache{$address}) {
                $addr = $cache{$address};
    else {
      if ($addr) {
        $name=gethostbyaddr($addr, AF_INET);
        if ($name) {
      # NOTE: To ensure the veracity of $name, we really
      # would need to do a gethostbyname on it and compare
      # the result with the original $f[0], to prevent
      # someone spoofing us with false DNS information.
      # See the comments below. For this application,
      # we don't care too much, so we don't do this.
          # Fix local names
          if ($name !~ /\./) {
                        $addr = $address;
                $addr = $address;
        # print $pre.'-'.$addr.'-'.$post."\n";
        print $pre.$addr.$post."\n";
  else {
    print $_;

To use it, save the code into the file "resolve", do a "chmod u+x resolve" on it, and then try the following:

last -10 | ./resolve

Risks of Web Services to Applications

Increasingly, applications are dependent on external web services. Web services are great - you can get current data on demand, inexpensively (relatively) because we can purchase it in small increments. Web services are typically not only data services, but also perform data processing functions.

Web services also represent a risk, because the services can be discontinued or change.

This is a diagram of a typical application that integrates web services with local data.

Suppose one web service goes away. Any features of the application that depend on the web service stop working. Either an error message will be thrown, or the features will lack data, or the data will be obsolete.

Suppose two services fail. More of the application will fail.

If there are dependencies between the web services - i.e., the application is a mash-up that combines data from two web services - then the failure of either service affects the other.

To relieve some of this risk, one uses caching. You create a local database that will hold copies of the remote data from the web service. Requests are routed to the cache, and if the data isn't present or is outdated, the request is forwarded to the web service.

The cache handles service outages well, but it doesn't handle changes to the services. The two most common changes are upgrades to the service, and changes to the company that provides the service.

Upgrades typically don't cause existing services to become discontinued. Companies will maintain the existing web service with some kind of adapter. However, as time marches on, the legacy systems will eventually be discontinued when few customers use it. I don't know what the lifecycle for a legacy service is, but it probably depends on the company providing the service. Government agencies seem to support data formats for longer than ten years.

Startup companies seem to last around three years - and when the remains of a company are merged with another company, it's rare that legacy systems are maintained as-is (they only want the customers, not their extant systems).

Major changes to services will require changes to the application. Either an inexpensive shim will be developed to adapt the new data to the old cache system... or the cache system and the application's access to the cache will need to be rewritten.

If the service vanishes, then it will be necessary to find a replacement service.

The lifecycle of business software is growing. In the 1990s, it would have been reasonable to expect software to become obsolete in five years. Today, it's common to run software that's nearly a decade old. In finance, utilities, government, and other slow-changing institutions, software lifecycles are measured in decades.

So, it should be expected that all software lifecycles grow longer as the software becomes institutionalized.

Exposure to the risk of web services changing increases with the length of the software lifecycle. The damage that change inflicts tends to grow geometrically if the web services are integrated together. Thus, the effect of changes will have a geometrically negative effect.

The local cache, and its behavior, are the only insurance against the inevitablity of changing web services.

appdiagram.png25.62 KB

Screen Scraping With wget (and Mailarchiva)

I was testing a new product called Mailarchiva, and I misunderstood the instructions. The upshot was that a mailbox full of messages was moved into Mailarchiva, and I wanted to restore them to the mailbox.

Mailarchiva comes with a tool to decrypt its message store, but it didn't work. The problem was that the main product and the utility package got de-syncrhonized, and the one tool I needed stopped working (because a method's type signature changed). Also, despite being an open source project, they didn't have sources for the utilities up on, so I couldn't re-build the program to make it work.

Not being a major java programmer, I had a hard time coaxing the system to the point where it would run without an exception - problem was, the utility's libraries expected one format for the message store, and the server's expected another. It was getting really difficult.

I had some manually produced backups, but not of the current month. (I didn't follow my own advice not to test with live data.)

You just can win, sometimes.

The solution, sort of, was to use the website dowloader, wget, to interact with the app via it's web interface, and use that to download the messages to files. Screen scraping.

First, I found a page with great examples:

Then, a quick visit to the wget man page:

Here's the short version of how to do it:

The first step is to figure out how to log in and get a cookie.

The second step is to figure out how to download the messages.

The third is to figure out the range of pages in the results, and then write a loop to recursively download the messages from each set.

Then, finally, copy the .EML files up to the server via Outlook Express.

Here's the long version:

First, you have to submit a web form, and get a session id in a cookie. Here's the command I used:

wget -S --post-data='j_username=admin&j_password=fakepass' is the IP address of my test installation.

The --post-data line lets you submit the login form, as if you were typing it in and submitting it. To find the URL to submit, you look at the source of the login form.

Then, you inspect the output, looking for the Cookie. Then, concot a longer, more complex command to submit the search form:

wget --header="Cookie: JSESSIONID=62141726A04B7C8BDE24C32514EB19F3; Path=/mailarchiva" --post-data='criteria[0].field=subject&criteria[0].method=all&criteria[0].query=&dateType=sentdate&after=1/1/09 1:00 AM&before=12/18/09 11:59 PM&'

Note that we're passing the cookie back.

Inspecting the resultant file will reveal that the search worked!

Then, you try to download the attachments by spidering the links, and downloading files that end in .eml.

wget -r -l 2 -A "**" -A "**" -R "" -R "" -R "" --header="Cookie: JSESSIONID=62141726A04B7C8BDE24C32514EB19F3; Path=/mailarchiva" --post-data='criteria[0].field=subject&criteria[0].method=all&criteria[0].query=&dateType=sentdate&after=1/1/09 1:00 AM&before=12/18/09 11:59 PM&'

That pretty much does what I want, but, I need to do it for a bunch of pages. The quick solution is to use the browser to find out what the last message is, and then write the following shell script:

for i in 1 2 3 4 5 ; do
wget -r -l 2 -A '**' -A '**' -R '' -R '' --header='Cookie: JSESSIONID=62141726A04B7C8BDE24C32514EB19F3; Path=/mailarchiva' --post-data='criteria[0].field=subject&criteria[0].method=all&criteria[0].query=&dateType=sentdate&after=1/1/01 1:00 AM&before=12/18/09 11:59 PM&page='$i

Note that a parameter was added to the post. It's page.

A parameter was also removed, the submit value. Submitting the old value seemed to prevent the paging. There's probably a branch in the code based on the type of "submit" you're sending, because there are a few different buttons, with different effects.

Again, that's discovered by reading the sources and experimenting.

So, I ran the script and waited a long time. Then, I shared the data via Samba (I coded this on a Linux box, but ran the application on Windows). A nice side effect was that the shared files displayed DOS 8.3 filenames. So, the messages, which were originally named "" became "BADJFU~5.EML".

To upload, I used Outlook Express. Despite its bad reputation, OE is good at interacting with IMAP mailboxes, and its support for the .EML file format seems to be good.

Wget saved the day (but it was a long day).

Lesson learned or, "lessons refreshed" is really what happened.

I should have set up a test account, put mail into it, then archived it.

Additionally, I should remember that when dealing with "enterprise" software, it's not going to work like Windows or Mac (or even Linux) software. Larger businesses are assumed to have certain processes that SOHO businesses don't.

This would be a perfect application for a web service. It would avoid all the program execution problems. Instead of accessing the data through command line application, access it over the network, using a simple interface.

Additionally, this kludgy rescue would have been impossible if the application had been written to use a Swing GUI or a native GUI. The web interface made it possible to scrape the data out of the system.

As for Mailarchiva - if you are trying to archive your own mail server, it seems to be a good product. The docs could use some work :) I found others, but Mailarchiva running on a Linux box would probably be the most stable solution. The bad news is that it's not intended for archiving personal email accounts like Gmail, AOL and ISP accounts. So, it wasn't the right tool for me.

What I really need is a free/cheap archiver for products like Gmail. It would both mirror and archive the IMAP folders, but allow the user to hold on to emails for as long as they wanted. So far, what I've found either doesn't do folders, or doesn't do archiving. Archiving is just saving every single email it sees, and retaining messages even if they're deleted.

Security Logic

Came up with this comment to help me think through end-user security.

	 * Security logic is based roughly on NTFS style allow and deny.
	 * The logic is as follows, in order:
	 * 1. If a specific role or user is in the deny list, they are denied.
	 * 2. If a specific role or user is in the allow list, they are allowed.
	 * 3. Otherwise, they are denied.
	 * There are three special values.  Anonymous is a user who is not logged in.
	 * All refers to all roles and users.
	 * None refers to no roles and no users.
	 * The default value of the "deny" list is "None".
	 * The default value of the "allow" list is "None".
	 * Here are some common recipes.
	 * If you just want to allow specific roles to have access, define only the "allow" list.
	 *   allow: A B C
	 * If you want to specify only one role to deny, but allow everyone else:
	 *   deny: A 
	 *   allow: All
	 * If you want to temporarily restrict a role, add it to deny, but don't remove it from allow:
	 *   deny: B
	 *   allow: A B C
	 * This is similar to Apache's Allow,Deny mode.  Unlike Apache, you cannot specify the
	 * order of tests.  This is a feature, not a bug.

Share a Printer from XP Pro to Vista Home

It's pretty difficult to share from an XP Professional machine participating in a domain to a cheap laptop running Vista Home (Basic). There are a lot of things to do or the entire system won't work. Windows has a lot of granular security that can trip you up.

* Make sure the printer is shared on XP Pro. From XP Pro, go to \\machinename and see that the printer is shared.

* Set the Vista Home (or Windows 7 Home) laptop's workgroup to the same name as the domain.

* Turn on network discovery and sharing on Vista or 7. (This may not matter - but it can help you spot problems like nonexistent computers on the network.)

* Make sure that any firewall on the XP Pro machine allows Windows file and print sharing through.

* Make sure that you can ping from Vista Home to the XP Pro machine. Laptops connecting via WiFi may not be on the same network!

* If access is anonymous, make sure the machine's Guest account (not a domain account) is active. If access is with a username (the more common situation) make usernames on the machine that match Vista's usernames. You can set these with the same password. You create these accounts in Control Panel -> Users.

* Make sure that access from the network is allowed. This is set in Control Panel -> Administrative Tools -> Local Security Policy. Look in Local Policies -> User Rights Assignment.

* You may need to get drivers for Vista or Windows 7. When you install the printer, you should use one of the "universal" or "global" print drivers offered by vendors.

Once you have access, double click on the printer icon and the drivers should install. When completed, the icon should appear in your printers. Print a test page.

Related solutions:
Vista - Sharing Printer Across XP Pro and Vista Home Premium
Windows XP Home to Vista printer sharing problem
Windows Vista lt;--> XP home networking succesfully resolved!
cant connect to xp pro share from vista home premium
Cannot share printer between Vista and XP
Vista Home Basic cannot connect to a shared XP HP 1000 printer

Shared Memory Example

Here's one for the noobs (from a noob). This demonstrates the use of shared memory. It's a program that spawns 10 children, and each one gets a special "babytalk" word to say. Each waits a random amount of time, and then writes it word into shared memory. Each child loops forever.

The parent loops forever, and every two seconds, prints whatever is in shared memory. The last child to write to memory "wins" and is "heard" by the parent.

Shared memory is a file that's treated like memory (or memory that happens to be written to a file). The filename is the name of the memory. You use mmap() like you would use malloc().

This is useful because you can duplicate a data structure across processes. (I'm thinking of using it for a kind of "scoreboard" where child processes write their results into shared memory.)

Here's the code:

#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>

struct d {
  char word[20];
char babytalk[10][6] = {
  "baa", "boo", "waaah", "urp", "eep", "naah", "yeee", "coo", "guh", "ooh"

void child_babble(struct d *shared, char *word)
    sleep( rand() & 0x5 ); 

void main() {
  struct d *shared;
  int fd;

  // create and size shared memory
  fd = open("/tmp/sharedmem", O_CREAT|O_TRUNC|O_RDWR, 0666);
  printf("fd: %d\n", fd);
  lseek(fd,sizeof(struct d)-1,SEEK_SET);

  // turn the file into shared memory
  shared = mmap( NULL, sizeof(struct d), PROT_READ|PROT_WRITE, 
                 MAP_SHARED, fd, 0);
  if (shared==MAP_FAILED) 
    printf("ERROR: %d\n", errno);
  printf("shared: %d\n", (unsigned int) shared);

  // spawn 10 children
  int i;
    if (fork()==0) 
      child_babble( shared, babytalk[i] );

    printf("%s\n", shared->word );
    sleep( 2 );
mem.c1.15 KB

Simple Templating Language in PHP

A few years back, there was a trend in the PHP community to make alternative templating languages that ran inside PHP. This was so the designers could create HTML templates, and include bits of code to display data. The best was probably Smarty.

After a while with this, a counter-trend emerged, of rejecting adding yet-another-language to the system. After all, PHP was a templating language. Some web frameworks used PHP as the templating language, but simply asked that only a tiny subset of the syntax be used. CodeIgniter and Savant did this. (So did the never released Slaptech code generator.)

I was firmly in this latter camp. There are already too many languages involved with PHP: PHP, Javascript, HTML, CSS, and xml. Templating systems are slower, too.

The world has changed, though. Today, due to AJAX, you need to produce lists of data encoded into xml, or into fragments of HTML.

You can easily do this with regular PHP... except that PHP can sometimes look sloppy, and leave you wanting a simple templating language. What's below is an extremely limited templating language, implemented in a single function.

Additionally, there are two more functions that will apply the template to arrays and iterators.

If you copy this code to a file, and run it on the server, it'll demo each function. More programming blather after the code.

# an extremely minimalist templating language
# $tpl = 'text{interpolate}text';
# $output = tpl_merge( $tpl, array('interpolate'=>'text'));
# // $output is 'texttexttext'.
echo tpl_merge'Hello, {name}.', array( 'name' => 'world' ) );

$o $t;
$find = array();
$repl = array();
$v as $var=>$val)
$find[] = '{'.$var.'}';
$repl[] = $val;
$o str_replace$find$repl$o );

tpl_merge'<p>Hello, {name}.</p>', array( 
'name'=>'{first} {last}',

# a template merger that applies the template to an array of arrays.
echo tpl_merge_array'<p>Hello, {name}.</p>'

tpl_merge_array$t$a )
$a as $element)
$o .= tpl_merge$t$element );

# A similar template merger that works with iterators.
# An iterator is defined, minimally, as an object that has a next() method
# that returns the next item, and null past the last element.

$c = new Collection();
$c->add( array( 'name' => 'Gloria' ) );
$c->add( array( 'name' => 'Steve' ) );
tpl_merge_iterator'<p>Hello, {name}.</p>'$c );

Collection {
Collection() { $this->= array(); }
add$a ) { $this->a[] = $a; }
reset() { reset($this->a); }
next() { 
$val current($this->a);
        if (

tpl_merge_iterator$t$it )
$a $it->next() )
$o .= tpl_merge$t$a );

So, clearly, you can use these functions to build pages in a functional-language style. Just define templates and immediately apply them to iterators that wrap around queries. Producing html or xml from queries is simplified. Best of all (for me) you can write more code in a functional style than in the dreaded OO style.

echo str_merge_iterator( 'template{here}', query('select here from foobar where here>100') );

It's not really that terse, but, the idea is, you're not writing any more loops. All that is hidden.

Stop Recording Bash History

Here's a script based on the information at

It erases your history, and then tries to alter /etc/profile to stop recording history for everyone. Run it as a user and as root for the full effect.

fix_bash_history.444 bytes

Strip Non-Numeric Characters from Data

This Javascript widget strips non-numeric characters from the input. The result will be a space-separated list of numbers. This is useful for extracting information from log files, dumps of data, and similar text.

Paste your data here:

Get your new data here:

Telephone Number Normalizers: fix phone numbers into a common format

It's common to get a list of names and phone numbers in a spreadsheet or from the web, and the formatting varies. In the US, people don't use a standard formatting consistently. Lately, they have taken to making phone numbers look like domain names or ip addresses, example: 415.555.1212. This function normalizes phone numbers to look like this: 213-555-1212 x1234. The code's structured so multiple regexes are used to perform the matching, allowing for easier modification of the code. (This code was written in Excel, but should work in any VBA application.)
' Convert almost any phone-like string into a normalized form.
' The form is AAA-EEE-NNNN xPBXX
' This works only for US telephone numbers, but it's structured so
' it's not too hard to alter for other formats (or other idiosyncratic
' data entry persons).
' Requires Microsoft VBScript Regular Expressions 5.5
Function NormalTel(Phone As String, Optional areacode As String) As String
    Dim parts(4) As String
    Dim re As RegExp
    Dim mat As MatchCollection
    Dim phAreacode As String
    Dim phExchange As String
    Dim phNumber As String
    Dim phExtension As String

    Phone = RTrim(Phone)
    Phone = Replace(Phone, Chr(160), " ") ' replace nbsp with regular space

    ' no areacodes
    Set re = New RegExp
    re.Pattern = "^(\d\d\d)[ .-](\d\d\d\d)[.,]*$"
    Set mat = re.Execute(Phone)
    If mat.Count > 0 Then
        If (areacode <> "") Then
            phAreacode = areacode
            phAreacode = "213"
        End If
        phExchange = mat(0).SubMatches(1)
        phNumber = mat(0).SubMatches(2)
        phExtension = ""
    End If
    re.Pattern = "^(\d\d\d)[ .-]*(\d\d\d\d)\s*x(\d+)[.,]*$"
    Set mat = re.Execute(Phone)
    If mat.Count > 0 Then
        If (areacode <> "") Then
            phAreacode = areacode
            phAreacode = "213"
        End If
        phExchange = mat(0).SubMatches(1)
        phNumber = mat(0).SubMatches(2)
        phExtension = ""
    End If
    ' no pbx extensions
    '(123) 456-1234
    re.Pattern = "^\((\d\d\d)\)[ ]*(\d\d\d)[ .-](\d\d\d\d)[.,]*$"
    Set mat = re.Execute(Phone)
    If mat.Count > 0 Then
        phAreacode = mat(0).SubMatches(0)
        phExchange = mat(0).SubMatches(1)
        phNumber = mat(0).SubMatches(2)
        phExtension = ""
    End If
    re.Pattern = "^(\d\d\d)[.-](\d\d\d)[ .-](\d\d\d\d)[.,]*$"
    Set mat = re.Execute(Phone)
    If mat.Count > 0 Then
        phAreacode = mat(0).SubMatches(0)
        phExchange = mat(0).SubMatches(1)
        phNumber = mat(0).SubMatches(2)
        phExtension = ""
    End If
    ' with pbx extensions
    '(123) 123-1234 x1234
    re.Pattern = "^\((\d\d\d)\)[ ]*(\d\d\d)[ .-](\d\d\d\d)[, .]*(x|ext|ext.)[ ]*(\d+)$"
    re.IgnoreCase = True
    Set mat = re.Execute(Phone)
    If mat.Count > 0 Then
        phAreacode = mat(0).SubMatches(0)
        phExchange = mat(0).SubMatches(1)
        phNumber = mat(0).SubMatches(2)
        phExtension = mat(0).SubMatches(4)
    End If
    re.Pattern = "^(\d\d\d)[ .-](\d\d\d)[ .-](\d\d\d\d)[, .]*(x|ext|ext.)[ ]*(\d+)$"
    re.IgnoreCase = True
    Set mat = re.Execute(Phone)
    If mat.Count > 0 Then
        phAreacode = mat(0).SubMatches(0)
        phExchange = mat(0).SubMatches(1)
        phNumber = mat(0).SubMatches(2)
        phExtension = mat(0).SubMatches(4)
    End If
    If (phExtension <> "") Then
        NormalTel = phAreacode & "-" & phExchange & "-" & phNumber & " x" & phExtension
        NormalTel = phAreacode & "-" & phExchange & "-" & phNumber
    End If
    ' No number was detected, lose the dashes.  Copy input if the it didn't get detected.
    If NormalTel = "--" Then
        If (Phone <> "") Then
            NormalTel = Phone
            NormalTel = ""
        End If
    End If
End Function

The Value of Practice

Nothing beats practice. No amount of reading documentation and theory will teach as much as that same material combined with a system to play on. A good tutorial is even better.

I really undervalued this until recently, when I started to set up our new network. While theory was good, using the hardware sped up my learning by a magnitude. If I had to put a value on it, I'd say that it's worth around $1,500 of my time to buy something rather than merely study the documentation. (Of course, without the docs, it's pretty pointless - you learn to use the gear like someone who doesn't read the docs.) So I could do 3 or 4 nights with the docs, but at that point, I need the system.

I'm learning LVM, and using a tutorial (or two) is teaching me a ton. LVM is kind of complex, because it's a couple layers of indirection between the logical volumes and the physical disk.

With indirection, the tradeoff is usually between flexibility and complexity -- the more flexibility you get, the more complex it is to comprehend. The only way to get a handle on how complex a system is, is to use it. So, only by practicing on LVM, and trying different levels of complexity, can I get an hint of what is probably "too complex".

That's why it's necessary to sit down and program rather than read about programming, and sit at a computer or virtual machine instance, and play around with the system. Maybe the granularity of the object model is too fine, or maybe it's not. Maybe applying many functions to the array is okay, once you really slow down and read it. Maybe RAID5 is good, and maybe it's not.

Turning California WARN PDFs into Text

This was an odd project. Taking several PDFs of layoff data and turning them into text, so they might be used more like a database. This info should be offered up by the state as a database, but it's not (at least it wasn't to me). I ended up using a PDF to Text application to generate text files, then wrote these scripts to scrape the data out of the text. My goal was to dig up all the unionized workplaces.

The WARN act is a law the requires employers to give 90 days notice of any coming mass layoffs. I don't recall the exact numbers, but, it applies to businesses that have a pretty large number of workers.

These scripts are basically complete, but running them requires moving them into the right directories. Study the sources to figure this out. (splits the text file into individual records)

#! /usr/bin/perl 
open FH, ">/dev/null";
	if ($_ =~ m#([^ ]+.*[^ ]+?).+?(\d+?)\s+(\d+/\d+/\d+)\s#)
		$comp = $1;
		$count = $2;
		$date = $3;
		$comp =~ s/[^\d\w]+/-/g;
		$comp =~ s/[-]+$//;
		$date =~ s/[^\d]/-/g;
		$name = "$comp.$count.$date.txt";
		open FH, ">splits/$name";
		print FH $_;
		print FH $_;
close FH; (read each file and extract the interesting parts)

#! /usr/bin/perl

$line = <STDIN>; ## 1st line is the company, count, date, and part of the
                 ## location.  split on ctl-U character.
$line =~ s/[\r\n]//g;
$line =~ s/ $//g;
chomp $line;
#print "**$line**\n";
$company = ( $line =~ /(.+?)\s+?\d+/ )[0];
if ($company !~ /\cU\cU/)
	$company =~ s/\s+$//g;
	$line2 = <STDIN>;
	$line2 =~ s/\s\cU\cU\s*//g;
	$company .= " $line2";
	$company =~ s/\s+$//g;
$company =~ s/\s+\cU\cU//g;
#print "**$company**\n";
($count, $date, $location) = ( $line =~ m#$company[\s\cU]+?(\d+?)\s+?(\d+?/\d+?/\d+?)\s+?(\w.+?)$# );
#print "**$count**$date**$location**\n";

$line = <STDIN>;
$line =~ s/[\r\n]//g;
chomp $line;

($street, $location2) = ( $line =~ /(.+?)\s+?\cU\cU\s+([A-Z ]+?)$/ );
##print "**$street**$location2**\n";
if (! $location2)
	($street) = ( $line =~ /(.+?)\s+?\cU\cU/ );
	##print "**$street**\n";
	$location = $location . ' ' . $location2;
#print "**$street**$location**\n";

$line = <STDIN>;
$line =~ s/[\r\n]//g;
chomp $line;

($city, $state, $zip) = ( $line =~ /^([\w ]+?), (\w\w) ([\d-]+?)$/ );
#print "**$city**$state**$zip**\n";

if (! $zip)
	($city, $state, $zip, $extra) = ( $line =~ /^([\w ]+?), (\w\w) ([\d-]+?)\s+(.+)$/ );
	$location .= ' '.$extra;

while($line = <STDIN>) 
	goto BAILOUT if ($line =~ /^Company Contact Name and Telephone Number/ );
	$line =~ s/[\r\n]//g;
	chomp $line;
	#print "**$line**\n";

$line = <STDIN>;
$line =~ s/[\r\n]//g;
chomp $line;
($cname, $layoff_or_closure) = ( $line =~ /^(.+?)\s+?\cU\cU\s+?Layoff or Closure:  (\w+?)$/ );
#print "**$cname**$layoff_or_closure**\n";

$company_contact = <STDIN>;
$company_contact =~ s/[\r\n]//g;
chomp $company_contact;
#print "**$company_contact**\n";

while( ($line = <STDIN>) !~ /^Union Representation/ ) 
	## accumulate contact info here
$line =~ s/[\r\n]//g;
chomp $line;
#print "1**$line**\n";

while($line = <STDIN>) 
	goto CONT if ($line =~ /^Name and Address of Union/);

$union_contact = "";
while($line = <STDIN>)
	goto CONT2 if $line =~ /^Job Title/ ; 
	$line="" if ($line =~ /Name and Address of Union Representing Employees/);
	$line =~ s/[\r\n]//g;
	chomp $line;
	$union_contact = "$union_contact\r\n$line" if ($union_contact ne "");
	$union_contact = $line if ($union_contact eq "");
#print "**$parts**\n";

print "\"$company\",$layoff_or_closure,$count,\"$date\",\"$location\",\"$street\",$city,$state,$zip,\"$cname\",$company_contact,\"$union_contact\"\r\n";

#! /bin/bash

for i in *.txt ; do
  echo $i
  ./ < $i >> report.csv
layoff.jpg30.83 KB

UPS Mishap

Woe to the sysadmin who trusts their UPS to work as expected.  It turns out that some UPSs won't warn you when the battery is dead or low.  You find out it's not functioning when you unplug it. 

So, as a matter of course, it's necessary to test UPSs.  It's not easy.  You have to schedule system downtime, shut down the computer.  Then unplug the UPS and plug in a device, and test how long it stays up, and test if the UPS beeps.  If the battery is dead, then it's time to buy a new set of batteries.

Ubuntu KVM Switching Problem, and Fix


KVM switchers read the ScrLk LED, switching computers when the see the LED toggle. Normally, you toggle it by pressing Scroll Lock twice. Ubuntu doesn't accept ScrlLock, and doesn't turn the LED on. Not finding a way to enable it, I opted to use the suggestion in the linked article, and created a KVM switching script.

The script here creates a new command, switchkvm.

echo "xset led on; sleep .25; xset led off" > switchkvm
chmod a+x switchkvm

I put an icon in my toolbar so it's one click away. Attached in an ugly icon for it.

swindows.png551 bytes

Ubuntu Linux PS/2 Mouse Stopped Working

After upgrading to a new kernel my USB keyboard stopped working. Arrgh, not again. I plugged in my spare PS/2 keyboard and started troubleshooting. The problem, it turned out, was that a version of the Ubuntu server kernel was installed, and that didn't boot up with USB. This wasn't the first time this happened, so I deleted those kernels and ran grub-makeconfig to create a new grub.cfg.

Rebooting brought back the USB keyboard, but killed the PS/2 mouse.

I tried a couple suggested fixes. First was to use the "i8042.nopnp" option. You do this by adding "i8042.nopnp=1" to the kernel line in grub.cfg. That didn't work.

Second was to add "psmouse" to the /etc/modules file, so the PS/2 mouse driver gets loaded at boot time. This didn't work either.

The problem turned out to be what PS/2 device was detected. The kernel always found the "KBD" device. The motherboard had a single PS/2 port which could support either the keyboard or the mouse.

The solution was to power the computer off, then on again. The port reconfigured itself to support a mouse.

What happened: the port self-configured to a keyboard when I was fixing the keyboard issue. I swapped in a mouse while the computer was powered up, and it never got reconfigured, even though I rebooted several times.

VBA: Transforming XML Error Messages into VBA Errors (Raising or Throwing Errors)

This is trial code that I used to translate an error from a Yahoo web service into a COM ErrObject.

It's not real XML parsing, but good enough for this purpose. IF an error message is sent, we extract the message and then use Err.Raise to throw an error.

Sub testRegex()
    Dim response As String
    response = "<?xml version=""1.0"" encoding=""UTF-8""?>:+" & vbCrLf & _
        "<Error xmlns=""urn:yahoo:api"">" & vbCrLf & _
        "   The following errors were detected:" & vbCrLf & _
        "        <Message>unable to parse location</Message>" & vbCrLf & _
        "</Error>" & vbCrLf & _
        "<!-- uncompressed/chunked Tue Aug 11 15:44:44 PDT 2009 -->"
    e = RegExMatch(response, "<Error xmlns=""urn:yahoo:api"">\s*.*\s*.*<Message>(.+)</Message>\s*</Error>")
    Debug.Print e
    If (e <> "") Then
        Err.Raise 123, , e
    End If
End Sub

Note that we don't create an instance of ErrObject (we don't do a "Dim e as ErrObject"). You can't instantiate one. There's only a single Err object in the environment, and you reuse it. That's why Err.Raise takes arguments, instead of allowing you to change the value of an Err.

The definition of RegExMatch is:

' Returns the first regular expression match object of comparing regular express test to source
Function RegExMatch(ByRef Source As String, _
                      ByRef test As String) As String
    Dim regex As Object
    Set regex = CreateObject("vbscript.regexp")
    Dim match As Object
    With regex
        .Pattern = test
        .Global = True
        .MultiLine = True
    End With
    Set match = regex.Execute(Source)
    If match.Count > 0 Then
        If match(0).SubMatches.Count > 0 Then
            RegExMatch = match(0).SubMatches(0)
            RegExMatch = ""
        End If
        RegExMatch = ""
    End If
End Function

Now you can use exception handling to deal with errors from the web service.

In this application, we really just want to mark the error and continue encoding more data.

Exception handling is nice because the function calls are nested a few levels deep. The looping is done up at a layer where we do a lot of SQL. The network communication is done within a network communication method, and there's one class in-between. You want the error on the network side to affect the behavior of the loop up in the SQL-calling layer.

With exceptions, each layer just needs a little code to catch the error and re-throw it up to the caller. Eventually it will be caught by a a caller that will log the error, and continue processing.

One of these days, I'll prep a nice example.

Vi and Vim, Macros

Vi and Vim have a "macro" feature to help automate routine editing tasks.

Sometime, you get a document, a file, or some data that's just messed up looking, or was formatted for printing, and you need to reformat it.

Most editors have some kind of macro function, where repetitive tasks can be automated. Unfortunately, these macros have, over the years, acquired an acute case of featureitis. Vim keeps it simple.

Unlike MS Word's Visual Basic, or even Excel's macros, which require some programming, vim's macros simply play back keystrokes. (There is a built-in programming language for vim, but that's a different feature.)

To record a macro, press q, then press a key to name the macro. The macro is named by a single character - you can't have a long name. You can use the letter keys and number keys. (If you use the number "2", it's easy to run the macro.)

Then, type your keystrokes.

To run the macro, press @ and then the name of the macro. If you selected "2", you can keep the shift key pressed down and type "@" again, and it'll still run macro 2. (My personal preference is to use the "q" key.)


The typical use of a macro is to reformat data. For example, I got some tabular data in a word processor. I couldn't easily extract the data, at first, but eventually figured out that it could be copied to the NVU HTML editor, which has a slightly better "copy and paste" function.

I pasted the tabular data into Vim, and got this:


Joe Blow


Harry Carey


Mary Christmas


Ann R. Key

I wanted one number and name per line. The macro was "JJJ[Enter]". It joins the number and name, then the third join closes up the gap below the line. The enter key moves the cursor down to the next line.

To execute this, I run the macro over and over, until I get"

1 Joe Blow
2 Harry Carey
3 Mary Christmas
4 Ann R. Key


A macro can run another macro. That allows you to run a macro over and over.

If you're writing a macro named "a", and you type "@a" at the end of the macro, the macro will call itself. This will cause the macro to be run over and over!

To stop the "infinite loop", press Control-C.

Typically, I don't bother with loops. Instead, I create a second macro that calls the first macro several times. The macro might be "@q@q@q@q@q" to run the "q" macro five times. Then, run that second macro several times, by hand. Often, there are small variations in the data that will cause the macro not to work perfectly, and I have to fix it up by hand.

If there's less than a hundred lines, it's easier to just type the 200 or so keystrokes to get the job done, rather than get complex.


One reason why vi/vim macros are more powerful (and popular) than macros in other editors, is because vi/vim has modal editing. When you're moving around the file, you are typically moving through lines and words, not characters. Joining and splitting lines, deleting words and lines, and moving to the front and end of the line, and of the file, are single keystrokes.

This is a slightly higher level of abstraction than most editors, and vi/vim forces you to use these abstractions. When coupled with macro recording, the macros are that much more powerful, because you can move around the file more precisely.

My .vimrc

set nocompatible " be iMproved, required
filetype off " required

" set the runtime path to include Vundle and initialize
set rtp+=~/.vim/bundle/Vundle.vim
call vundle#begin()
" alternatively, pass a path where Vundle should install plugins
"call vundle#begin('~/some/path/here')

" let Vundle manage Vundle, required
Plugin 'gmarik/Vundle.vim'

" All of your Plugins must be added before the following line
call vundle#end() " required
filetype plugin indent on " required
" To ignore plugin indent changes, instead use:
"filetype plugin on
" Brief help
" :PluginList - lists configured plugins
" :PluginInstall - installs plugins; append `!` to update or just :PluginUpdate
" :PluginSearch foo - searches for foo; append `!` to refresh local cache
" :PluginClean - confirms removal of unused plugins; append `!` to auto-approve removal
" see :h vundle for more details or wiki for FAQ
" Put your non-Plugin stuff after this line

set guifont=Monospace\ 8
set autoindent
set shiftwidth=4
set tabstop=4
set fo=cro
set modelines=5
set expandtab

VirtualBox OSE: can't find kernel driver, run modprobe vboxdrv

I got a message to run modprobe vboxdrv, but didn't seem to have the vboxdrv driver.

It turned out that the vboxdrv.ko object existed (turned up by doing a "locate vboxdrv"), but not for my current kernel. The solution was to rebuild the driver for my kernel. To find out what kernel I had:

uname -r

If you don't have the vbox drivers, install them:

sudo apt-get install virtualbox-ose-dkms

The vbox drivers are build using DKMS, which is a driver framework. (It allows drivers to be built apart from the kernel, so the drivers are standalone, somewhat like they are with Windows.) Normally, the apt-get program will rebuild the drivers automatically, but if the kernel headers are not installed, they will not get rebuilt.

So, check to make sure your kernel headers are installed.

ls /usr/src

If you don't see a directory corresponding to the kernel version number, you need to install the linux-headers package from apt-get.

sudo apt-get install kernel-header

Run that, and you might get a list of potential headers packages to install. Choose the one that matches your kernel version, and install it. Example:

sudo apt-get install linux-headers-2.6.32-24-server

Installing a new kernel should trigger a rebuild of all the dkms-based drivers.

If the headers are already installed, add the --reinstall flag:

sudo apt-get --reinstall install linux-headers-2.6.32-24-server

Then, start up the virtualbox-ose service (which loads up the drivers):

sudo /etc/init.d/virtualbox-ose start

What Is the Difference between Access and Excel?

There's probably a frustrated IT or database person telling someone that they shouldn't be using Excel, that the data should be in Access or a database. The Excel user on the receiving end is probably wondering what Access is.

They seem similar. They both store data as rows and columns, but it's the differences that make a difference.

Excel lets non-programmers manage data in flexible ways.

That's why people like Excel. You put your lists in there, and it's easy to add different kinds of categorization. You can use highlights, colors, boldface, italics, and different font sizes. Some people use elaborate indentation. It's pretty awesome... for human beings.

For a computer, that's all "mess". While people can tell each other "the red background means so-and-so owes money, so make them pay up first," getting a computer to deal with that is tougher.

Someone who knows how to program the spreadsheet - to do math, or comparisons, or use filters, or make crosstabs - they'll tell you to add a column, and put a 1 or 0 in there to indicate that. They can then use a filter to produce a list of people who owe money. (At this point, this person might suggest using Access. That's usually ignored.)

Incidentally, the spreadsheet was invented in the 70s to do the math, not to store data. People just started using them to store data, and when the Mac and Windows versions came out in the 80s with fonts and styling, people started using the styling to organize the data, too. That's how people are. We do things the wrong way, and if it looks nice, we think it's cool. Yeah, we're stupid, or something.

Access lets programmers or semi-programmers manage data in flexible ways.

That clever person who could do the filters took the spreadsheet one step towards being used like a database. Access is a database system.

The main problem with using Excel to store data is that it's difficult to store large amounts of data, and manipulate it. The bigger the list becomes, the tougher the task becomes. Managing 100 rows is easy. Managing 1,000 rows is tougher. Some people end up making complex macros to perform the manipulations.

That's where Access shines - when you have a lot of data. Like in the example about the red highlight above, you can't resort to styling tricks; you have to make new columns and put values in them.

Instead of filters or macros and direct manipulation of the data, you perform "queries". A query is a database's way to extract a subset of rows from the database.

Access has a query designer.

Here's an example of a query:

SELECT name, address FROM customers WHERE debt=1

That selects a subset of customers where their "debt" value is 1, which means "true" in our system. That's not too hard, it is?

Access separates data entry from data reporting.

In a spreadsheet, the data you enter is the data you see, sort, and filter.

In a database, the data entry is separated out. Database systems typically have "forms" for data entry, and "reports" or "report writers" to print or export data.

We're all familiar with databases because we use the web. Some websites have fill-in forms. You submit them, and, generally, you get some information back, like an order number or an email. That's analagous to forms, databases, and reports: the fill-in form saved data into the database, the page with the order number or email you receive is a report. The report is mostly a big template, and your data is this tiny thing, but it's still a report.

Access comes with a form designer, and a report designer.

It's a system

With Access, you have queries, forms, and reports. You also have a programming tool similar to macros. Each of these things is saved within a database file.

What's nice about having these things saved in a database file is organizational. Each of the things you want to do has a place to be filed, and a name. It's not all stuck in your head, like when you use Excel.

That "red highlight" example above could be saved out as a query, maybe named "qry People Who Owe Money". So someone who knows Access can read the query and run it.

In fact, you can still have a red highlight. You can do it in the reports. You can create a report shows all the people, and puts a red highlight behind all the rows of people who owe money. The report can use a query that calculates the debt as the data source.

This description of what Access can do only scratches the surface.

The problem

The main reason why people don't switch to Access is because it takes away a lot of things, like fonts, boldface, colors, and all the tools we use. What it offers is a more spartan environment, with only grids and text. So, people start using Access and think, "this sucks."

But it doesn't suck; you just haven't gotten to the good part yet. It gets good when the amount of data increases.

Excel is fine for a few hundred rows of data. Access is considered small for a database system: it's good for up to tens of thousands of rows. Database servers are typically used to store millions or even billions of rows, and hundreds of columns, across thousands of tables. It's vast.

What is HTML 5?

HTML 5 is a marketing term (kind of like "cloud computing") that has a somewhat imprecise technical meaning, but was created so that products and people could easily sum up their compatibility or knowledge and skills.

For example, Firefox 13 is HTML 5 compliant, and this website is HTML 5 compatible, and I know how to write applications using HTML 5 features.

HTML 5 roughly corresponds to the baseline web-browser experience on a new PC in early 2012.

HTML 5 is three browser-based technologies which can be used, together, to create web pages, web site, and web applications that begin to rival what could be done with Flash and desktop computer applications in the recent past (around 2003 or so). Yup - you can now do in a browser what you could do, in other ways, nearly a decade ago.

The main difference is, with HTML 5, you can deliver this experience over the internet. And it doesn't require installing any software, which is hugely important.

The three technologies are: HTML, CSS, and JavaScript.*

HTML is a way to create pages which bring together text, images, and video. CSS, combined with HTML, allows designers to change the appearance of that data. JavaScript is a programming language that is used to manipulate CSS and HTML to create pages that respond like desktop applications. JavaScript also controls much of the "behind the scenes" technology to store and retrieve data on the computer, and across the internet.

HTML 5 refers to the totality of these technologies, at specific versions, with specific features, and a roadmap for future features. It's a moving target, but one which all the major browsers are aiming to reach.

Lastly, there is one final piece of the puzzle, but it's not part of "HTML 5" - that's the web application server. The app server provides the shared experience of the internet, so many people can go onto one site and use it, together.

* Note that all these technologies have existed since the mid 1990s.

CSS Hints for Technoids Who Forgot to Learn CSS

This article is being rewritten. If you want the latest, contact johnk at this domain.

The original was written: 2004-11-18 03:16:46 -0700.

Here's a bit of the article:

Dang, but it took me forever to learn CSS. Maybe I should have used a book. Here, I'm going to share with you the hard-found knowledge, presented using technical programmer jargon. (Revised in 2014.)

What is Cascading Style Sheets (CSS)? The typical answer is that it's a way to separate the way a page looks from the the underlying HTML, which describes the structure of the document.

What is HTML? It's a markup language used to add a hierarchical structure and formatting codes to text. The HTML and CSS are interpreted by a web browser, to display a web page.

By itself, HTML is sufficient to do formatting that's adequate for term papers, short books, instruction manuals, and other basic documents (like this document). However, it's insufficient for doing graphic design for web pages. That is what CSS is for: precise formatting of structured text.

HTML as Code
To understand CSS, you need to understand HTML. HTML has two characteristics that programmers will understand. First, it is object oriented. Second, it's hierarchical. The stream of text is treated as a hierarchy of objects, which contain text, and also contain other objects.

Here's the simplest HTML document:

<!doctype html>

The first thing is called a doctype entity, and it's like a header line that identifies the document. The second part is the HTML tag. There's an opening tag, , and a closing tag .
Here's a more conventional HTML document:

<!doctype html>
    <title>sample document</title>

Within the HTML tags are pairs of tags for HEAD and BODY, and within HEAD, there's TITLE. As you can see, it's a hierarchy of objects delimited by tags. The code, when interpreted by a web browser like Firefox or Internet Explorer, is converted into objects. The tree of objects is called the Document Object Model or DOM.
When people think of tags, they often think “markup”, but don't yet think “object”. Start thinking of them as objects that contain what's between the opening and closing tags.
Here's a final example, and the one we'll use to describe and explore CSS:

<!doctype html>
    <title>Hello, world.</title>
    <style type=”text/css”>
       body { font-family: Arial; }
    <h1 id=”headline”>Hello, world.</h1>
    <p class=”latintext”>Lorem ipsum...</p>
    <p>This is regular text</p>

The HEAD object typically contains resources and information about the document, but not any text of the document. The STYLE tags delimit a block of CSS code, which the browser will use to style the page. The code within the STYLE tags is not displayed.

The BODY tag now contains three things. H1 is a heading tag. It has an attribute “id” which has the value “headline”. Attribtues are like object properties. The ID attribute is used by CSS to identify tags, and must be unique within a document.
The P tag delimits a paragraph. The default formatting for P is flush left, with a margin above and below. The CLASS attribute is similar to the ID attribute, but more than one tag can have the same value for CLASS. So multiple paragraphs could have the “latintext” class.
Now let's get into the CSS code. Here's the code again:

body { font-family: Arial; }

CSS is a simple language with very little syntax. That one line is called a RULE.

The part on the left, “body”, is called the SELECTOR. The part on the right, in braces, { font-family: Arial }, is called the STYLE DEFINITION. The parts inside the braces are called STYLE ATTRIBUTES and VALUES.

What that rules says, is, for all tags that match BODY, set the font to Arial. Only one tag matches, and it's the whole document.

CSS is a DECLARATIVE LANGUAGE. The programmer declares how the document should look, and the browser figures out how to find the objects that match the selectors, and then apply the style definitions to those objects.
A CSS program is a stream of rules. For any given HTML object, all the styles that apply to the object are combined, with the later rules further down in the stylesheet overriding the earlier rules.
For example, we could add some more rules:

  #headline { font-family: “Arial Black”; }
  .latintext { font-family: Times; font-style: italic; }

Aha, we see a couple different selectors. The first is a selector by ID:

  #headline matches only the object with id=”headline”.

Next is selector by CLASS:

  .latintext matches any element with class=”latintext”.

The style definitions are pretty self explanatory, but the selectors require a bit of explanation and example. They are extremely important, though, because understanding selectors will help you to use CSS properly. Without this understanding, you will make some mistakes that might cause problems in future iterations of your website.

CSS Selectors are a kind of querying language. The query is run against a hierarchical database of objects, the HTML DOM.
The least specific selector is the tag. After that is the class, which can apply to more than one tag. There are several more levels of specificity, which I'll discuss in a moment. Then, way over at the other end of specificity, is the ID.
There are a few other ways to query the DOM.

  body h1 { …. }  Applies to situations where H1 is within a BODY.
  body>h1 { …. } Applies only to situations where body is the parent of h1. 
  p.latintext { …. }  Applies only to P tags with the class=”latintext” attribute.

What is a Server?

I've been asked this simple question, and given the simple answer: it's a PC that's on all the time, running services for others. Well, that's right, technically, but it's also the wrong answer to tell everyone.

This post is inspired by this video: A good video about servers by Eli the computer guy. (His videos are good. Kind of long and repetitive, but basically right on the money.)

The first followup question I get is usually "it's not a special computer?" Well, um, yeah it's special, but it's basically a regular computer.

A good analogy is the difference between utility vehicles. There are mini trucks, there are 1 ton and 2 ton trucks, and there are longbeds. They're all cars with beds, but they have different capacities, different sizes, and so on, but are basically the same technologically. The difference is that if you load up the mini truck with a bunch of bricks, you're going to damage the suspension.

A server computer generally has better performance, particularly when you have multiple people trying to access files. They generally have redundant hard disks, in removable trays, redundant memory that is able to correct errors, redundant CPUs that can fail, and redundant ethernet connections so you can have one burn out. That said, they are generally slower at some other things, like graphics. Usually, the graphics suck. You usually don't have as many USB ports. The fans are sometimes as loud as a small vacuum cleaner. So, as a PC, they have some real negatives.

Server software generally installs "lean" - they features aren't turned on. Years ago, they used to have many features turned on, but all the sysadmins wasted hours removing all the stuff they didn't want. So the new style is to deliver the software with everything turned off.

The latest Windows Server, 2012, can even be installed without graphics. Likewise, if you install a Linux server, you may or may not get graphics. You're going to get almost nothing. You have to install that stuff later.

The weird thing, of course, is you get less, and have to pay $2400 for the software. LOLZ!

The same goes for the hardware. It's expensive. Is it worth it?

Well, that depends. You usually don't have a choice. The server will be more reliable. If you want full redundancy, you have to get the expensive software, and set it up.

The only exception is if you're building on Unix and can build a redundant network of cheap computers. It's doable, but it's also a lot of work.

Mathematically, if you spend $10,000 on a couple good servers, your redundancy (and performance) isn't going to be as good as a network of 10 $1,000 computers. The network will be around 8 or 9 times faster, and the reliability will be, I'd guess, several magnitudes better.

Think about it - what's the odds that a crap computer will expire this year? Pretty high. But what are the odds that 5 will expire? Probably close to nil. Not only that, but the time to replace that computer is around a day. Just go to the store and buy another computer and rig it to replace the failed box. When was the last time you lost half your LAN within a couple days or a week? Yeah, I've never really heard of that happening before either.

On the other hand, an expensive server might be less likely to fail, but there's still a chance. When it fails, you're left with one other computer. Again, odds are, it's safe. But your risk exposure is going to be a day or two to get a rental server, and a week to a month until you can get the broken server replaced. The only way to eliminate that exposure is to buy a third computer.

The trade off is that it's much easier to administer two servers than it is to manage a network of 10 machines. It uses less space, less electricity, and the overall setup is just less complex. So for a small business, the simpler solution is the right one, even if it's riskier and really a bit more expensive.

For a data center, the cheap path is the way to go. You not only have less risk, you're also going to develop the technology to grow the network via this redundancy. The Amazon EC2 model is to run a bunch of cheap computers, and then have them basically act like a smaller network of extremely reliable computers. They charge double what web hosts charge, but they scale up. So at some point, when you outgrow a web host, and are faced with buying an expensive server, the EC2 system ends up cheaper. Since the system runs on cheap hardware, Amazon makes money off the margin between the cost of a cheap redundant network, and an expensive, less redundant server.

(Note: Amazon probably buys custom-made computers that are even cheaper than regular PCs. They're probably motherboards with a couple CPUs, no graphics, no disk IO and no power supply.)

Video Comment

ECC registered RAM is a memory with a couple parity bits, similar to RAID-5. It's worth it. I had a server that experienced around 5 RAM errors per year. ECC is not just memory, but infrastructure that tells you that RAM is failing or flaky.

What is my IP?

WiFi: Improving Reception with a Chip Bag

The best ways to improve WiFi reception is through antenna positioning and using reflectors to guild the signal.

I have a PCI WiFI card that's positioned in a terrible location. It's in the bottom slot, and is surrounded by coiled up power cords, computer cables, and other crap. So, my signal is weak. I get "2 bars".

On a lark, I took a big potato chip bag (Lay's) and cut it into a big rectangle. Then washed the oil off, and taped it to the wall behind the WiFi router's antenna. I got an immediate one-bar improvement in signal. It wavers between two and three bars now.

The bag has a thin layer of foil, like most chip bags nowadays. That foil works as a reflector, bouncing the signal in my general direction, and also reducing interference from the other side.

There's no curve in this version of the reflector. A curve would tend to focus the signal, and thus narrow it's coverage. I didn't really want that because the signal's used throughout the house.

Eventually, I'll glue this foil onto cardboard, and add a very slight curve to it to direct the signal a little bit more toward me. Mounting it on cardboard would remove some of the crinkles, which are probably distorting the signal.

Also, using old soda cans, I'll set up some parabolic reflectors for my wifi card's antenna, and also figure out how to deal with the wire and noise problems. The real solution is probably to get an antenna that can be positioned higher up.

wifireflector.jpg14.57 KB

Windows Backup, Backup Exec, and System State Recovery

I was using Backup Exec to maintain several backups of Microsoft Windows Server 2003. The backups were kept on a different server (a small NAS box).

*** Impatience:
One time, however, the system had a very hard time starting up, and the user interface eventually slowed to a crawl. It was impossible to interrogate the system, much less do any work on it. (It could have been malware - I'm still going to have to try and fix this box.) Out of impatience, I rebooted the machine by holding down the power switch. (Never do this again. Wait it out as long as possible.) This corrupted the Active Directory database.

Subsequent boot failed, and the OS was telling me to boot into Active Directory Restore Mode. So I did. A search of help found some instructions on testing the AD databases. The tests failed. Repair also failed. (The weird thing was, the JET db engine seemed to be failing.) A web search said to perform a restore from backups instead.

AD is normally backed up as part of the "System State". In Backup Exec, System State is backed up as part of the Shadow Copy Components. Shadow Copy is an OS feature that snapshots a file, so it can be backed up -- this is necessary for backing up the numerous files that make up the operating system.

*** Can't start up Backup Exec:
Unfortunately, by booting into AD Restore mode, the domain controller (which wouldn't work anyway) was down. The Backup Exec services were configured to log on as a member of the domain. So BE wasn't starting up. Thus, a restoration was not possible.

A quick read of the BE help found an article about how to restore system state when in AD Restore mode. You have to modify the services to start up as the Administrator. Once started, you have to perform a restoration of system state, making sure that the credentials are set to Administrator.

*** Can't get to backup files:
Also, the NAS had been set up to use AD as well, so it was inaccessible. There was a non-AD username and password on there, so I re-logged-in as that user, and everything was okay again.

A System State restoration was performed. The data was checked using NTDSUTIL, and it was okay. So a reboot was performed, and the system came up fairly quickly and without incident.

*** It took 1.5 hours:
The main problem was, it took 1.5 hours to perform the complete restore. This was due, partly, to reading instructions, and partly to the time required to start up the server. A better backup configuration could have kept the downtime to around 30 minutes. This is described below.

Solution - use Backup to save System State, as well as Backup Exec.

Backup is the built-in backup software. Unlike Backup Exec, it will run without AD. It can be scheduled to create a system state backup once a day, to a local disk. Once a week, it could create another, slightly longer-term system state backup. This way, we can quickly restore to yesterday's state after a single reboot. (This would take 10-15 minutes.) If that fails, we can try the weekly backup (another 10-15 minutes). If that fails, rely on the backups on the remote server or tapes.

Windows Remote Desktop on Windows 7, Speeding Up Slow Performance

If Windows 7 seems much slower, try changing the theme. Click the Start Menu, then type "theme". The option to change the theme should appear. Click on it.

Set the theme to "Classic" (it's near the bottom of the list). This removes all the gradients, leaving the system looking like Windows 2000. Without the gradients, everything will be faster.

Windows Small Business Server 2003: disabling sbcrexe.exe

This is an awesome tutorial on how to kill this annoying process that forces the owner to run SBS as a domain controller.

It's also a great howto about permissions in Windows.

We have to do this because Microsoft Marketing decided that every copy of SBS should run as a Domain Controller. If you happen to have two licenses of SBS, and want to turn one SBS into a plain-old-server for file-serving purposes, or some other lesser use that would benefit from a leaner OS setup, you cannot. SBS forces you to do the "domain" thing, or you can go purchase another license for the regular Win2k3 Server.

This is why I prefer to deal with Linux. Less marketing scheme BS. Almost everything is licensed free, per seat, per cpu, or per machine.

Here's the text, by "Blarghie", copied here:

Pafts original post drew me to this thread after a google search.

I also didn't want to have to bother with this crap that my legitimate copy of Windows SBS couldn't run unless it was a DC. As it happens, we already had a second licence of SBS and simple wanted to re-use a currently un-used licence of SBS to implement a webserver, but without all the bloat that the SBS install affords.

The first thing I did was to install the server normally, the first chance you get to cancel the install of SBS bloat is when Windows starts for the first time after install, I seized my opportunity.

What I didn't see however was the quite frankly ridiculous scenario whereby Microsoft had decided to force restart the server every hour and NET SEND spam the network "this server doesn't comply with licensing requirements" across the entire network. Microsoft can stick that.

Anyway, like I said it was Pafts post that brought me here to the forum, and I've found a slightly more elegant solution to this problem rather than just aggressively killing the process until Windows gives up trying to start it again, and I'd like to share it in the hope that Google will re-index and pick it up for others to use. You may have noticed this service cannot be disabled via the MMC snap-in.

My search term on google was: how to stop the SBCore service

Anyway, down to business…
- Tools you'll need – Process Explorer from

As you probably know, you have a service called SBCore or "SBS Core Services", which executes the following process: C:\WINDOWS\system32\sbscrexe.exe

If you kill it, it just restarts – and if you try and stop it you are told Access Denied.

If you fire up Process Explorer, you can select the process and Suspend it, now we can start to disable the thing.

Run RegEdit32.exe and expand the nodes until you reach the following hive / key:

Right click this, hit permissions and give the "Administrators" group on the local machine full access ( don't forget to replace permissions on child nodes ). F5 in regedit and you'll see all of the values and data under this key.

Select the "Start" DWORD and change it from 2 to 4 – this basically sets the service to the "Disabled" state as far as the MMC services snap-in (and windows for that matter) is concerned.

Next, adjust the permissions on the file C:\WINDOWS\system32\sbscrexe.exe so that EVERYONE account is denied any sort of access to this file.

Then go back to process explorer, and kill the sbscrexe.exe process, if it doesn't restart – congratulations!

Load up the services MMC snap-in and you should find that "SBS Core Services" is stopped and marked as Disabled.


Windows XP Boot USB

1. A USB drive. I ended up with a SanDisk OEM'd one from Staples.

2. Ultimate Boot CD for Windows is good because it has a lot of tools. There are others, but many seem to be based on Bart PE.

3. PE to USB, takes Bart PE output and writes a bootable USB drive.

4. If it BSODs, HERE is a fix.

Windows XP, Installation Acrobatices with Product Keys

I've wasted around five hours this past week dealing with miscellaneous Windows XP licensing issues. Ever since Microsoft (basically) started tracking users, it's been difficult to resort to the tried-and-true technique of "justified piracy" to maintain one's legitimate software license rights. After all, the way 99% of the world sees it, if you paid for a licensed copy, but lost the CD, you're entitled to use the software legally.

What happened was, my work didn't have a copy of the Windows XP installer CD. Instead, it had a "Volume License Key" CD, which is a slightly different installer, which accepts only VLKs, as I learned when I tried to install XP using the key on the sticker on the computer. Because they had the VLK CD, I proceeded to call MS to see if they had a VLK agreement. They wanted me to assent to what amounted to a software audit for the site's Windows licenses. That seemed a little dicey, because there were definitely duplicate keys in use, but each machine also had an genuine XP sticker on it with a unique key. The admins probably used the same key out of habit, when they upgraded boxes.

It was also going to take days to process this bureaucratic mess. So, the internet came to the rescue. I found some VLK serial numbers on the internet, and installed from the CD. Then, I went in and tried to change the key. This invalidated the original key, and forced me to use Microsoft's tool to change the key. After supplying the original key on the sticker on the computer, everything went smoothly.

For instructions on modifying the key:

If you want to avoid all this hassle, use Linux.

Windows XP, Windows Server 2003 Miscellaneous Links

Microsoft's list of Services and descriptions.

How to Install DLLs with Regsrv32

Windows: Drive Mapping Weirdness, Lost Data

There are a few weird situations with Windows and drive mapping that should be noted. One situation points to bugs in Windows, and the other to some malware.

Windows Weirdness

If you are on a server, and a drive is mapped to a shared folder that's also on the local computer, you can lose data. I was updating an ADP installation, and the data was in D:\ADP (not really, just for explanation purposes). This was also shared as \\Accounting\ADP. That share was mapped as drive F:.

Running the update on the data in drive F: didn't work. Additionally, attempting to update the files in D:\ADP seemed to cause windows to re-resolve it back to drive F:. So I had to unmap drive F: then perform the update.

The data loss was recorded as some kind of Smb event. It was all very strange, especially because, even though the files were shared, they were on the local machine, so it's not like the data really went over the network. This indicates there's some kind of error in the file server software, or in the networking software.


Another time, on a client computer, the drive mappings to the file server were working, but access to the shares via UNC paths (i.e. \\fileserver\sharename) wasn't working.

I never figured this out, but running the free F-Secure malware scan fixed it.

My guess is that the name resolution was being intercepted, and mapped drives may connect to the server through a more primitive API that doesn't rely on name resolution. This is just a guess.

Windows: Installing Printer Drivers for 64bit Clients with a 32bit Server (Printer Driver Hell)

The Windows server's printer server works like this: when you go to a server and double click on a printer icon, the system will download a printer driver from the server and install it into your local copy of Windows. The server has a small database of printer drivers, and each has a list of compatible operating systems, "bits" (word sizes and machine architectures).

An x64 system will try to download the x64 driver. If it's not available, it will download the 32 bit drivers.

The drivers are typically installed by running an installer on the server. This unpacks drivers, and then installs a printer. That printer is then shared.

If you need to install drivers for different architectures, you can usually download the files and install them via the printer's properties.

Unfortunately, the Xerox Global Printer Driver system does not offer only the driver files. All the files are bundled up in an executable. The 32 bit drivers are in a 32 bit executable, and the 64 bit drivers are bundled in a 64 bit executable.

So if you have a 32 bit server, and a 64 bit client, you are in trouble, because you cannot run the 64 bit driver bundle on the server. What you need to do is unpack the bundle on the client, and then install those files on the server via the client. After that's done, you must re-install the printer. Here it is in detail.

- First, as an administrator, install the Xerox Global Printer Driver, 32 bit, on the server in the regular way. Run the installer, do not install a printer. Go to the printer properties and specify a new driver. Point the installer to the driver files, and they'll be loaded up.

- Log in as a domain administrator on the client.

- Download the Xerox Global Printer Driver system. Focus on using the Postscript version. The PCL6 version seems to fail.

- Double click the installer, and watch it unpack the files into a folder on the C: driver. Remember that folder. It will ask if you wish to install a printer. Decline that offer.

- Go to the server and double-click on the icon for the printer on which you wish to install 64 bit drivers. The printer should auto-install itself.

- Open the printer, and open it's properties. You should get an error alert because the driver is the wrong type. The properties should appear shortly.

- Click on the Sharing tab, and click on the Additional Drivers... button.
- Check off x64, click OK.
- Browse to the drivers folder (you remembered this above), and click OK.
- It will ask again for a GPD file. It's in that same folder, so specify it, and click OK. The drivers should start uploading from the client to the server. (That's how the files will get up there.)

- Next, you must go to your Printers and Devices and delete the icon for that printer.
- Go to the server, and double click that printer's icon. This time, the system will ask you to install drivers.

What happened is, the first time you installed the printer, it installed the 32 bit drivers because the 64 bit drivers were not available. The second time, it found the 64 bit drivers and installed them.

As a final step, test the installation.

Windows: changing the password for a network share

If you have a network share mapped to a drive letter, and it stopped connecting because the password changed, it won't ask you to correct the stored password, or even delete it. To fix this, go to the User Accounts control panel (type it into the Start Menu's search). Click on "Manage User Accounts", and then the "Advanced" tab, and then then "Manage Passwords."

All your stored credentials will be listed. You can delete or change these from here. Deletion is easier, because when you reconnect, it'll show the password dialog a few times until you get it right.

Xubuntu Process List Notebook

Ever wonder what Xubuntu is running when you start up? Here's a hyperlinked document based on running "pstree -A".

     |                `-{NetworkManager}
     |           `-sh-+-ssh-agent
     |                `-xfce4-session
     |                    |-hald-addon-cpuf
     |                    |-hald-addon-inpu
     |                    `-hald-addon-stor
     |           |-orage
     |           |-xfce4-terminal-+-bash---pstree
     |           |                `-gnome-pty-helpe
     |           `-{xfdesktop}

Your Computer Has Been Reinstalled

System Name:

Owner's Account:

Administrator's Password:

Your computer has been wiped clean and reinstalled. Your data was backed up as best as possible, and has been restored to your "My Documents" folder.

An extra account, named Limited User has been created. This user lacks the permission to install software. For additional security against viruses, use the Limited User instead of the owner's account.

The following have been installed:

Norton Antivirus - which came with your computer.

Firefox - this is a replacement for Internet Explorer, and tends to be a less popular target for virus attacks.

Microsoft Office - this was on there before.

A CD is provided with the following:

Drivers for your computer - they were downloaded from the manufacturer's website.

Several trials of anti-virus products, including Avira and ClamAV, which are free.

Your Norton anti-virus expires in 90 days, and you will either need to start paying for it, or purchase another antivirus program like Avira, Kaspersky, Trend PC-Cillin, or another program. (There's a new ad gimmick at where you can get "free" antivirus software by buying products you don't need, and getting on all the junk-mail lists.)


Sick of tYPING lIKE tHIS? Wish you could press Control-C without contorting yourself? If you're on Ubuntu, there's a feature to help you out:

On KDE (on Gentoo at least), it's under Control Center -> Regions and Accessibility -> Keyboard Layout -> Xkb Options, in the list.

Screenshot-Keyboard Preferences.png42.15 KB

nohup - runs your programs after you log out

The following command will run the script, and then keep running the script after you log out.

nohup ./ &

Miracle? No. Nohup just ignores the kill signal, preventing the script from getting the signal. Thus, the script won't exit when you log out.

The interesting thing about the command is that it gives you an idea of how easy it is to write nonterminating programs. You have to do work (or let the library do work) to automatically exit the program when the user exists.

rss2txt: RSS Headlines Output as Text

This is a script that takes an RSS URL as an argument, and emits the headlines. Potentially useful if you have a small text-reading device that doesn't handle HTML.

#! /usr/bin/perl

use XML::RSS;
use WWW::Curl::Easy;

my $curl = WWW::Curl::Easy->new();
open DEVNULL,">/dev/null";
$curl->setopt(CURLOPT_URL, $ARGV[0] );
my $response_body;
open (my $fileb, ">", \$response_body);
my $retcode = $curl->perform;

my $rss = new XML::RSS;

foreach my $item (@{$rss->{'items'}}) 
	print $item->{'title'}."\n\n";

close DEVNULL;