# Novice's Notebook

This is a repository of "novice" articles, written with the intent of driving more traffic to the site, and getting more ad clicks. It's pretty crass, I know, but the information may be very useful. Some of the content is adapted from the diy notes, and other notebooks, which are a bit rougher than these.

Most of these articles are not authoritative, because they're based on what I'm learning, as I'm learning it.

# Anti Virus Problem: a Hacked Shell that Won't Run EXE Files

I'm starting to forget this one already, but, recently, I dealt with a virus that hacks the shell and inserts a handler for .EXE file.

You know how, when you double click on .DOC files or .XLS files, the system automatically opens it with the correct application? The way that works is that Windows Explorer has a mapping that describes how to start a .DOC or .XLS file with Word or Excel.

The normal thing for .EXE files is, basically, to do nothing. Just run it as-is.

A virus might alter this, so that, instead of running the EXE file, it runs another program first. (Then, that program will run the EXE file as normal, so that you don't notice something's wrong.) The particular instance I had popped up a window telling me I had a virus, and clicking on their button would sell me a product to remove it.

At least these intruders were somewhat honest.

After finding this problem with RegEdit, I edited out the issue. However, I also made a serious error, and effectively disabled my ability to execute any EXE file.

Getting out of this situation was difficult. What I ended up doing was figuring out a way to execute a file without passing it through the shell.

The solution was to use the Scheduled Tasks to run a .BAT file. That .BAT file contained a line that started RegEdit. I think this task was scheduled to run as Administrator so it would be able to save the registry (but I may be wrong on that part).

The point here is that Scheduled Tasks don't run through the shell. They run in the old Command Line Interface (CLI).

I guess the big question is - why can't I just run the CLI? Well, the CLI is also an EXE file - it's cmd.exe. So when you try to run that through the shell, it ends up being intercepted.

# Anti-Pattern: Working With Live Data

I recently lost a chunk of data while I was developing a nice little macro to produce a report. How it was lost, is pretty sad. I had become used to pressing a few keys to clear out my spreadsheet, and I accidentally pressed the keys on a spreadsheet of the live data. Pffft. Data vanished.

I luckily had most of the data in another document, and restored some of the lost data, but, the lost bits were lost. All this was due, not to faulty code, but because I failed to create a development sandbox.

Yes, this was only a macro, but, even for something so simple, it's smart to make a separate place to develop it. This sandbox would have contained a copy of the data.

A sandbox is better than a backup. That's because the sandbox is a minimal subset of what you need to write your program. The real deployment environment is usually a lot more complex. To back up the real environment, so it's safe to develop in there, could be more difficult than you could imagine, and take a long time, too.

I tell everyone "work on a copy of the data, not on the original." Well, "physician, heal thyself," is what I should be told. I needed to work on a copy of the data, and not the original.

# Backup Book

How to Backup is a free online mini-book explaining basic ideas about how to backup your network, backup technologies, and backup strategies to keep your systems online, and your data available.

How to Backup is a simple read. It doesn't get too theoretical. It doesn't cover enterprise backup - it's for small businesses and home offices.

You think you know what a backup is, but, do you really?

What is backup?

A backup is a copy of your data.

A backup is an archived copy of your old data.

A backup is a system that can be used to deliver your data, if the primary system fails.

A backup is a system that keeps operating, transparently, even if part of the system fails. It's fault-tolerant.

A backup lets you recover from bad data, quickly.

A backup with frequent incremental backups lets you undo a huge run of bad data.

A backup is a part of a system that costs less than the entire system, that allows nearly all people to keep working in the event of an equipment or data failure.

### Links to variations on this booklet

How to Backup the Network at Home

# Warning: Part of this document are obsolete for larger disks. A short report on a failure.

I recently had a RAID5 array fail, and learned something about backup: it's not just about the data, but also the recovery time. The recovery time was several hours to bring the system to a state of usability, and several days of work to a state of relative safety, and that required bringing someone in to help with migrating files. Subsequently, we decided to use a two-server setup based around making incremental disk images. Incremental disk images will help make recovery within an hour more feasible.

A RAID array with 700 gigabytes of data takes hours, even days to back up. it takes even longer to restore, because writes take longer.

Exchange recovery proceeds incredibly slowly. A seemingly small 30 gigabyte database took what seemed like half a day to recover.

These two facts can put you in a situation where you have all your data on backups, maybe even multiple backups, but recovering from a failure will take a very long time, forcing the entire business offline for a day, or longer, costing hundreds of dollars per hour (or more) until the system is fixed. That doesn't even include the real value of the work, which (as any leftist would tell you) is greater than the costs of doing the work.

This is an unacceptable situation. Ultimately, it's a good value to spend a few thousand dollars to have a redundant system on-site. Buy enough capacity for two servers, use them both all the time, and when one fails, move all the work onto one server for a few weeks (until the new system is sent and configured).

## RAID nightmares

Large disks are statistically more likey to suffer read errors. Today, all disks ship with errors, and simply map them out. So they need to be continually scanned so the disk can find and fix these errors.

A RAID5 array failure can be difficult to fix. When a disk fails, you can replace it, but if you haven't been running the background consistency check feature for months, it won't be able to rebuild the array successfully: during the recovery you are likely to suffer a read error and then the entire array will go offline.

It's better not to replace the failed disk. Instead, force the entire array back online, and then perform a file-level backup, and restore to a fresh disk. Don't run a consistency check, because that will cause the RAID controller to take the array offline when it hits the error. Doing a file-level backup seems to be more tolerant of errors, or maybe the sectors with errors are just less likely to be read.

Forcing the array online will allow the business to continue operating. Just be aware that the array is damaged and all the data needs to be migrated off of it. It's a zombie disk, undead, and no new data should go on it.

Install a fresh disk, and start migrating all the active data to it, and migrate users onto that disk. This won't take much time, because your active data set is typically small. It'll only take a few hours to do this for most scenarios. It won't be so easy for older server-dependent software, but for newer software with a cleaner separation between client and server, it'll be easy. Set up a frequent backup for this data.

If you haven't started a full backup, do so, and all the older files will be covered by this backup.

If the C: drive is on the array, you will need to image the partition and then move it to the new disk. This is tricky (and we called in our consultant to do it). I'm not sure how to do it, but it requires knowledge of the Windows boot sequence, and may require editing the boot.ini file and the registry so it'll try the new partition first, and totally ignore the old partition.

(This isn't any easier than on Linux. The lesson I learned is that being able to manipulate or even recover and create the boot sequence is a must-know skill for sysadmins. It's also hard to learn and practice, requiring spare hardware and whatnot.)

Once the system is on stable new disks, you have to re-unify the active and old data. I used WinMerge, a great file comparision tool, to do this.

For backups, I used NTBackup - it was an old system. NTBackup has flaws where it'll just fail to save some data. It's also very "quiet" about this - you have to read the final report. I used the error report to build a file list that NTBackup could use to perform an additional backup. Usually, this second try would result in all the files being saved.

Restoring data onto Server 2012 and Server 2008 R2 was weird, because the new OSes don't use NTBackup. You need to dig around to find tools to restore from NTBackup bkf files. The tools work fine.

The newer backup tools are all centered around disk images. The built-in tools don't do incremental backups, so you need to find a 3rd party solution for those. We're going to use ShadowProtect, which is sold by our consultant. I don't know the price, but the market rate for Windows backup with incremental backups is around $1000. For an equivalent disk-image-based backup system on Linux, you use either software or hardware RAID (I prefer software) and use LVM volumes and virtual partitions. You use "snapshots" to freeze the disk state, and compare disk states. The differences are copied to another computer with a mirror of the partition (via rsync). The main problem is that system performance with snapshots is worse, so you have to work around that. # Archives and Archiving Files and Documents Archiving is different from backups. Think about them separately. An archive is an organizational strategy for data. It's a structure into which data can be stored in a way that makes it easy to retrieve the data in the future. There are a few different ways to organize information. To use some computer terms: "tables", "time", and "hierarchy". Tables refers to database tables, where data is organized into records and fields (or rows and columns). A record is a unit of data, like a row in a list. A field is information about the data, or the data itself, like the columns in a row. The useful property of a table is that every row has the same columns, so you can sort and group by columns. NameAgeSex John40M Rosa36F A hierarchy is like a filing system of folders. Chronological organization is to organize information by time, so you can retrieve the data from a specific time period. The computer's file system uses all three methods of organization. Each file has common fields, like the modification time, size, and usually a file extension. The files are stored in a hierarchy, and people typically name the folders uniformly. This uniform naming breaks up the filename into fields, so it's easier to sort through the files. For more info, see the file naming convention articles below. The file system generally lacks the ability to add extra fields of data. For example, it would be useful to be able to attach major and minor version numbers to every file. While there are some ways to do this, there isn't a simple way that exposes itself through the user interface, easily. Consequently, the folder hierarchy is usually used instead of extra fields. It's not a bad or good thing - it's just how we do it. For some examples of this, see the folder organizing articles below. Good archiving can assist backups by breaking the file system into parts. For example, if the folders are organized by client, you can split up the backups by client. Then, you can direct archives for old clients onto specific media, which might be kept offline for offsite. With very little work, you can cut down the time required to backup adequately -- and that translates into a greater capacity for the entire backup system. # File naming convention with dates The file naming convention I use starts the name with a date: YYMMDD-file-name.ext If I'm making revisions, I add initials and revision numbers separated by a dot or a dash: YYMMDD-file-name-x.ext or YYMMDD-file-name.x.ext Similar conventions are used for folder names. Though the system adds modification times, I still put the date into my file name, because the system's time and date can be lost. If a file is emailed, the creation date can be lost. Putting the date in the filename helps retain this extra data. Using the date in the filename also helps when with the naming. Typically, I'm working on things for other organizations or people (for money), so I can name a file with the date and the other party's name. As new files are created, I don't have to invent new file names. If there are multiple projects, just add the project name. The date assures that there's no need to invent new names all the time. # File naming conventions for routing documents past multiple editors In a typical office, several people have to read a document - the writers, the editors, a manager, the signatory to the document, and possibly some artists. In many offices, this is carried out over email. The problem with this technique are multiple, but for the backup administrator, the main problem is that each mailed file consumes space in the mail server's file system. It wastes space and network resources. It also fails to scale up past small documents. Imagine editing long documents this way - it's not realistic. The standard solution is to have everyone work on a shared file system. Some offices use a system of "folders" where a document is edited, and versions are moved from one folder to another -- each folder acting as a kind of inbox and workspace. The folders within a project may be named "source", "edit", "review", "signed". Specific people look at each folder, and work on the contents within. Some offices use project names, but other use project numbers. Numbers may actually work better than names, because people are generally good at mapping numbers to names, but not as good going the other direction (think about how much easier it is to see a phone number and identify the caller than it is to remember a phone number). Not only that, but, numbers are more precise than words -- people won't mix up "9099" and "9080", but they may mix up "Ford" and "Ford Foundation" and thus create confusion. Some offices alter the file name of a document as it's modified. For example, you start with a document named "2010-Tribe.doc". As it gets edited, the file accumulated editor initials: "2010-Tribe.a.doc" then "2010-Tribe.aj.doc", and so forth as each person reviews the work. Because the name changes, the backup software that runs every night will save each revision of the file separately. Similarly, if you use a file syncing software, you can accumulate revisions onto your backup. # File naming conventions for websites Websites are archives. A website that isn't an archive is one that displays a lot of "404 errors" - file not found. Perhaps more than other kinds of archives, it's important to plan the archive out for accepting new files for a long period of time. That's because websites get links, or what some call "deep links", which are links on pages past the so-called "home page". (I think it's a stupid distinction - a home page is only for branding and frequent users, and there are few of the latter. Most traffic comes from links and search engines.) When you rename or move files, you break all the links out there. That's the fabric of the web. To avoid this problem, you have to break up your system into manageable chunks, and you have to do it from the start. If you expect to upload new image files every day, you should plan to have a system that can handle 365 files per year, and 3,650 files per decade. A single folder might be sufficient for the first 365 files, but, things get unwieldy at 3,650 files if you have to look at the files and pick them. Even the network will slow down when you get a file listing. The solution for that is to use dated folders. If you expect to get few images per year, except during events, when you get hundreds of images. The obvious solution is to create one folder per event. I like to prepend the year to the event, so you get names that sort by date, like 09picnic. If that's not precise enough: 090815-picnic. Uppercase? You can use upper and lower case, but at your peril. Windows and Mac are case insensitive, but Unix is case sensitive. That means in Unix, "Car.jpg" is different from "car.jpg", and both are different from "car.JPG". On Windows and Mac, all three are the same file. The hazard is that you create the three files on Unix, and then copy them to a Windows or Mac, and end up with only one file (or an error). The convention is to use all lowercase for naming files on Unix. To avoid problems, rename your Windows files in lowercase if they are destined for the Web. Separate HTML files from image files? Most websites have all the images in an images directory, and the HTML files are in other directories, or are in the "root" of the server. (the topmost directory). This is probably because a single HTML file tends to include more than one image. Thus, as the site grows, moving images into their own directory just makes sense - it's a quick fix to the problem of growth. Suppose each page includes three images. Then, each new page causes four files to appear on the server. 100 pages later, there are 400 files. By moving images into a directory, the 100 pages cause only 100 visible new files in the directory. # Offline Email Archiving If you keep old emails, and some of that information is sensitive, you should archive them offline on a computer that doesn't get connected to the network all the time. While this isn't failsafe, it does prevent intruders from accessing sensitive data on backups. (Need to explore backup security issues.) Ways to set up offline backups vary. Below is an explanation of how to set up an ad-hoc system on Mozilla Thunderbird. Set up a new Account of type "Other Accounts...". To do this, go into an existing accounts settings, and below the list of accounts, there's a drop-down to let you "Add Other Account..." Choose the "unix mailspool" type. This is a file-based email drop. (Unix can deliver email on the local system via files.) Go through the rest of the configuration, and name it something like "offline archive". Next, go into the "Server Settings" section of this account's settings. It will display the directory where the mail is stored. Click "Browse..." and change this to a directory on an external hard drive. (The hard drive must be connected and powered at this point.) Next, once established, move your data into this offline archive, using the regular Thunderbird methods of dragging and dropping. # Backup Laptops with a Dock If you have a laptop that you travel with, consider getting a dock for your office desk. If your laptop isn't dockable (because it's a "home" laptop computer), then, get a universal dock. A universal dock is a dock with a USB connector, and an internal USB hub. (You might call it a glorified USB hub.) To the hub, attach a USB hard drive or USB flash drive. The USB flash is better, because it uses less power. Get some "sync" software that synchronizes folders. Some will initiate a sync when the drive is connected. Windows users may use a tool like Allway Sync or FolderClone. Set it up to backup the My Documents folder and perhaps the Desktop as well. Every time you dock and log in, the software should sync and backup your important documents to the USB flash drive. See the article backup external hard disk or usb flash drive for more ideas. # Backup Tapes Backup tapes are a popular backup medium, but recently has become more expensive than disk. It's cheaper to use hard disks for backup. Backup tapes have some advantages. They are smaller than disks, so you can pack more into a box, and send it to an archival location. Usually, backup tapes are stored in a cool, dry room. They are more durable, in that a shock to the backup tape won't cause failure, whereas disks may have a head-crash. There are many different types of backup tapes, ranging from the old 9mm, serpentine, the Travan, and the DAT. The main backup tapes out there are 4mm that are used with DAT drives. Enterprises (meaning businesses with scale and money) still buy backup tapes. Consumers (meaning everyone else) has moved on to disk-based backups. Backup tapes cost more per megabyte than disks, unbelievably. I guess there are a lot of enterprises overpaying for their data. # Backup to CD-R and CD-RW Backup to CD shares a lot of problems that backup to DVD has, with some interesting differences. The main difference is that CDs ware 1/5th the size of DVDs, so you can't backup as much data. Consequently, the backup is "faster" because the data set is smaller. So, CDs are basically not good for backing up your system, but, are a great way to make archival snapshots of your work-in-progress. For example, if you wanted to retain 7 days of your past work, you can purchase 7 CD-RWs, and label them "Monday", "Tuesday", "Wednesday" and so forth. Put them in jewel cases, and then into a CD box. Each day, either at the start or end of a day, run a backup of your work, and then store it. It won't take more than 10 minutes. For your effort, you are rewarded with an archive of your most important data, at your fingertips. # Backup to DVD Creating backups on DVD-R or DVD-RW allows you to store up to 4.8 gigabytes of data (or 2.4 if you use single layder DVDs). The main advantages: • low cost • archival, by default • widely supported, and readers are common The main disadvantages: • slow write speeds • limited capacity • data is easily damaged If you are going to backup to DVD, get an SATA DVD burner. Chances are, you're only going to do a data backup, so, make duplicates of all your installer CDs and DVDs first. Make a disc with all the downloaded installers, and all the serial codes. Then, backup the data. You may need to partition your data on the disk, and set up different backup jobs, to spread the backup across multiple DVDs. Prepare to spend a lot of time waiting. Another disadvantage is that you can't always choose backup software, because the burner may not work with generic DVD burning software. That all said, a DVD is very light, and easy to mail. It's a great way to make a weekly backup of a large project that can be sent off-site "just in case". It also gives your client something solid in exchange for paying their weekly invoice. # Backup to External Hard Disk or USB Flash Drive A simple, transparent way to backup a personal computer is with an external hard drive or USB flash drive. You don't need special software to do this - just copy the files. The real issue is getting your files organized so all your documents are saved to the disk in one simple motion. (See organizing your files.) Also, if you're a data-completist, you'll want to save the settings files (the .dotfiles in unix, and the hidden Application Settings in Windows). If you wish to automate the process, some of the best software to use is "sync" software that compares the copies to the originals, and updates the copies automatically. The program I use is Allway Sync. There are others as well, but I found the interface to Allway Sync easiest to comprehend. External hard disks have two risks. One is that the power adapter may fail. Another is that, because the drive is in a mobile case, you can drop the disk and have a head crash. USB flash disks are less prone to damage, but it's possible to put them in your pocket, forget about them, and toss your pants into the washing machine, destroying the device. USB flash drives also tend to be fragile because they stick out of the USB port. If you want to install it permanently, get a cheap usb extension cord at the dollar store, and tape the disk to your case. A good backup solution for someone who isn't computer savvy is sync software, a USB flash memory drive, and the aforementioned extension cord. Set it up for them, and tell them to store their documents in only one folder. # External Hard Drive Backup Tips If you're going to use a large external hard drive, for archival or simple backup purposes, here are the pros and cons of different cases: External "Toaster"-type adapters These are square blocks with a slot on top that accepts a SATA hard drive, and connects to your computer through USB or e-SATA. The pluses are convenience, cost, and speed. The minuses are the risk of metals shorting out the drive electronics, and a lack of heat dissipation. External case with fan These are the best cases - until the fan fails. Then, it's not so great. My personal experience was that the fan failed after a year. The pluses are the fan. The minuses are the risk of the fan failing - potentially leading to a hot hard drive. External case without a fan These are the second best cases. The ads say that the case is designed to pull heat away from the hard drive. It works as advertised, but, the heat must the be removed from the case. So the entire case needs sufficient ventilation. Pros: nothing to break. Cons: you still need to figure out a way to remove heat. Power supplies: a universal problem Generally, for whatever reason, the power adapters I've used with these external hard drives have generally been junk. They'll last 1 to 2 years, and fail. There's no simple solution out there, except to buy another adapter. Make sure the adapter is on a hard surface with good circulation. (A possible solution is to hack a PC power supply and use that to power all your electronics.) # Unix Backup Scripts with Rsync Rsync is a good way to create and maintain a "mirror" of specific folders on your unix system. It's not good for archiving, for cloning disks, or running a "full/incremental" backup system. What rsync does is compare two folders, and syncrhonize them. The following command will backup my home folder to an external disk called "/media/extdisk". rsync -a /home/johnk/ /media/extdisk Of course, life cannot be that simple. I have some huge folders with a lot of chaff that I don't need. First, I don't want to backup my Downloads directory. Nor do I want to backup my Freenet storage, which is 10 gigs. I also have a 24 gigabyte Music folder, but don't want to scan that every singe time I run a backup. Conversely, I want the Desktop folder backed up frequently. The typical unix way to handle this situation is to write a backup script. Here's my script. It's stored in the target backup directory, so it's not listed. I "cd" into the directory and run the script: #! /bin/bash rsync -av /home/johnk/Pictures/ Pictures rsync -av /home/johnk/Sites/ Sites rsync -av /home/johnk/Desktop/ Desktop rsync -av /home/johnk/Documents/ Documents  The backup takes around two minutes to scan 24 gigabytes of data and back up the few new files that appear. There are ways to execute this script when the disk is plugged in, but they differ based on OS. The system in Linux is called udev, and it's fairly complex, and I'm learning it. # Backup to Floppy Backing data up to floppy went out with the 1990s. Hardly any computers have floppy disk drives anymore. That's why it's important to gather your floppy disk based backups and move the data onto a hard disk. When you do this, you'll be shocked at how much data's been lost to floppy disk degredation. You may need to clean your floppy disk drive heads if you are a smoker. If you can't find one of those cleaning disks, you can fake it by taking a floppy disk that you aren't going to need and pouring a little alcohol on both sides of the disk. Pop it in for three or four seconds while the disks slide across the surface, then pop it out. The risk here is that the diskette will shed material before the heads get cleaned - so use a new diskette. Floppy disks are still useful for doing system installations or restorations, so you still see them on some back-office systems. # Cloud Computing Backups With more and more work being done "in the cloud" with web-based applications that store data on a remote server, edited through a web browser (or specialized client application), you'll want to backup the remote data locally. The way to do this is to export the data using a tool that automates the process. For example, Google Docs Backup. One of the nice things about Application Service Providers is that they save you from installing and updating software. The big risk is that they'll upgrade and leave your older documents unusable. Legacy data in traditional backup scenarios is managed in two ways: one is pickling, where an entire system and software stack is retained to read the data. Another way to manage legacy data is to convert it to newer, more useful formats, or to older generic formats. Cloud computing leaves you only the latter option. So, applications like Google Docs Backup try to convert the data to something generic. # How to Backup Email The simplest way to backup an IMAP email account, like a Gmail or Aol Mail account, is to use desktop client software. Two popular services, Hotmail and Yahoo Mail, don't support IMAP, so, you're kind of "out of luck" with them. The rest seem to support IMAP. Two popular IMAP mail clients are Outlook Express (now called Mail), and Thunderbird (from Mozilla). Both are nice because they allow you to create local files, and also save the email in industry standard formats like .eml and .mbox. You can also script Outlook Express to do some of your dirty work. The typical backup solution is to create local folders -- folders that aren't stored on the server -- and copy the server's data into these local folders. There are also IMAP sync tools that copy all the data from one IMAP account to another. These very in speed, and most aren't fase enough for frequent backups, but, they can be used to copy the data over. If you don't have a second IMAP server (and you probably don't), consider using something like Debian to set up an internal mail server that's used only to hold backups. # How to Backup Google Docs TBD # How to Backup MySQL on a Website Usually, a web host will give you FTP access to a directory and a web interface. A button in the web interface will produce a .ZIP file with the database contents, and you can download it via FTP. If you have shell access, and you run a Unix at home, and develop your own website, you can use this script. It dumps the remote database, and then loads it into a local copy of your database. #! /bin/bash echo echo Dumping db to db.mysql. echo Type your password. ssh launion@webhost.net mysqldump -u uname '-p--remotepw---' database_name > db.mysql echo Loading mysql -u root '-p----password----' database_name < db.mysql  It's really just two lines of typing, but having it scripted is nice. # How to Backup Websites TBD # Database Backup The correct way to backup a database is to use a "mirror" or "replica" of the database. That's a server that's running a duplicate of the database, and, perhaps also acting as a load-balancing server. These two servers are connected by a network, and as requests come in to the main server, they are either performed on the main server, or passed on to the mirror. Update and delete operations are carried out on the main server, then executed on the mirror. A typical scenario is to reserve a single IP address for hosting the service (on LAN). The main database has this address. It also has a second IP address for inter-database communications. The second datbase has only the inter-db communications IP address. If the main database fails, the second machine "takes over" the IP address. This failover is combined with database replication. The free MySQL server has this feature, and is described in replication. This provides maximum fault-tolerance. A simpler backup that doesn't have the advantages of a mirror, is to dump the contents of a database to text files, and back those up. This is obviously a lot cheaper than purchasing a second database server. If you opt for the latter method, make sure that you can build a database server and load the data quickly. For archival purposes, you may want to make a database dump regularly, compress it, and have it backed up with the rest of the files. If you're backing up a database on a website, see the article How to Backup MySQL on a Website . AttachmentSize database.gif13.02 KB # Failure Computers fail, eventually. There are hardware failures, and software failures. Hardware is nice, and tends to fail one part at a time. So, if your system breaks, you can replace the part, and be operational again. That's assuming that you can still purchase the part. You may have a spare, but, does it still work? Some components, like electrolytic capacitors, can age and fail. Software failures are harder to detect, and sometimes, software failures are invisible when they happen, but manifest symptoms later, with bad data being revealed. # Full Backup A full backup is a backup of all the files. It's used in contrast to the incremental backup, which is a backup of files changed since the last backup. A common problem with full backups of networks or large file servers is that they take a long time. Backing up 300 gigabytes of data can take over half a day (over a 1gigabit ethernet network, to a SATA 3, RAID 5 NAS box). So, full backups are typically scheduled to run over the weekend, when fewer people are using the network. If there's too much data, a full backup may not be possible. The only solution is to split the file system into separate branches, and backup the branches on different days. Full backups are performed in conjunction with incremental backups, usually scheduled to run once a day in the evening. A typical schedule is to perform one full backup each month, and then perform an incremental backup each evening. Generally, it's bad to schedule full backups that fall on the 1st, last, and 15th day of the month, because those are "paydays" and it's possible that accountants may need to use the computers. (I think that mean the second and third weekends are best.) That said, backups are important enough to run even if someone's working on the weekend. # Incremental Backups An incremental backup is a backup of all the files that have changed since the last backup. Typically, you perform a full backup, then a series of incremental backups. You can perform full and incremental backups using tools like BackupExec or NTBACKUP.EXE. All commercial backups can do incrementals. Incremental backups take far less space than full backups, and also take a lot less time to perform. In some situations, it's feasible to run backups during the workday. Restoration of files from an incremental backup are performed by restoring the latest version of a file. This is done automatically. Generally, previous versions can also be restored, so incremental backups also serve as a way to archive changes to the file system. In many cases, incremental backups are better for archives than full backups for archives. For one, files that are created, and then later deleted, in the interval between full backups, are not stored in the archival backup. The problem is, basically, size - because a full backup is the same size as all the files. Keeping incrementals as well requires the full size, plus space for all the incrementals. Incremental backups after periodic full backups are the preferred way to perform backups. # Inverted, Multiple Backups with USB Flash Memory When you have multiple computers, you might want to put your files on a USB flash disk (aka a thumb drive or jump drive), and backup data to your computer's disks. Create a folder called "backups", and a folder within it called "usb", and copy your files into there. If you have automatic sync software, you can set it to backup the data frequently. Hard disks are so fast you won't even notice the backups. Set this up on all your computers, and your risk of data loss is nearly zero. Just let the data "settle down" and force a file copy before removing the USB flash memory. You could also set up a similar scheme with a USB external disk. The only issue is that moving hard disks tends to damage them and lead to data loss. If you have important data, consider using encryption software. Losing the USB flash drive with your vital data would be bad. # Restore Specific Files from a Huge .BKF NTBackup.exe file I like using NTBackup.exe on the old VMs, but discovered that if you don't keep up on the backup rotations, you will have a very hard time doing restores. The NTBackup.exe restore doesn't make it easy to restore all incrementals of a folder. It turns out there's a sideways solution with Unix. Copy the huge BKF file to a Linux computer. If you don't have one, use Virtual Box and set one up in a VM. Next, download the attached application and build it. (Again, if you don't know - you'll have to find a tutorial.) It's also here: http://gpl.internetconnection.net/ The command I used was like this: ./mtftar -f inputfile.bkf -o outputfile.tar (Except I used the full path, and had the input and outputs on different disks, for speed.) Then you use tar to extract a specific folder: tar xvf outputfile.tar "F\:/path/to/restore" The quotes help, as does escaping the colon. The tar file contains the DOS drive letter, unfortunately. The backslashes were converted to regular slashes. Watch the names scroll up the screen. That sure beats using a mouse and clicking on icons! You might want to redirect that output to a file, so you can see what it restored, to check if there were any files overwritten. AttachmentSize mtftar.tar_.gz16.52 KB # Secondary Backups It's a good idea to run two sets of backups. For one, it's possible that the backup software can fail, leaving some data unsaved. It's also difficult to check the backups every day, so it's likely that some minor glitch could lead to several days without backups. You could run out of space on the backup device. A device may go offline and stay offline for no discernable reason. Going more than one or two days without a functioning incremental backup is unacceptable. As more work is lost, there's a "network effect" where people depend on other people's work, and you have to involve more people in the disaster recovery effort. A cheap way to avoid this problem is to run two sets of backups - a primary and a secondary backup - with the full backups staggered, and with longer runs of incremental backups on the secondary backup. Store the secondary backup on a different device (a hard disk in your PC is a good place). This way, if the system fails on the primary, you can use the secondary to recover. If the secondary fails, you have the primary. In my experience, you can run two backups and check them twice a week, and there is never a situation where both backups are failing, but there's occasionally a problem with one of the backups. # Subversion (svn) as a Backup If you're programming (or managing programmers), you can use Subversion or any other revision control system as a backup. If you're not using a revision control system at all, it's a good idea to start immediately. Not only does the repository (the system where the code is stored) a backup of the code - it's also a way to roll back changes to your code. All the popular systems also enable team programming over the internet. The few hours spent learning the system pay off many times over in saved time and mitigated risk. # Be Specific; The Inner Platform Effect Here are some choice words about seemingly perpetual problems that emerge in software development. http://en.wikipedia.org/wiki/Inner-platform_effect The Inner-Platform Effect is the tendency of software architects to create a system so customizable as to become a poor replica of the software development platform they are using. ...In the database world, developers are sometimes tempted to bypass the RDBMS, for example by storing everything in one big table with two columns labeled key and value. While this allows the developer to break out from the rigid structure imposed by a relational database, it loses out on all the benefits, since all of the work that could be done efficiently by the RDBMS is forced onto the application instead. http://en.wikipedia.org/wiki/Second-system_effect In computing, the second-system effect or sometimes the second-system syndrome refers to the tendency to design the successor to a relatively small, elegant, and successful system as an elephantine, feature-laden monstrosity. # Building a Micro-Business Site A few people have mentioned that they need a website for their crafts business. This article explains how to get started. ## Websites To open a website, you need to purchase a few separate things, and bring them all together. You can do it yourself, or ask your web host or designer to take care of it. Most will, for a small fee. Here are the things you need: • A domain name. foobar.com, for example. You can register these at a "domain registrar." There are many companies doing this; I personally use godaddy.com because they're cheap. • A web hosting company. They rent you some space on their web server, to host your site. • A web site. This is the actual web site, which is a group of files on the web server. You create this on your own computer, and copy them to the "live" web server. ## Domain Names Chances are, you will want a name for your site like mygreatsite.com. That name, "mygreatsite.com", is called a domain name. The name is part of the Domain Name System (DNS), which is a whole other thing that isn't covered here. To get a domain, you have to pay a company to "register" it for you. Typically, the fee to register a domain name is between$8 and $20 a year. There are many companies that register domain names, but the technology is very basic, and you should go with the cheap companies. You can also ask your web host to register a domain for you. They may charge you a setup fee for doing this. They will charge you the annual fee they get assessed from the domain registrar. ## Web Hosting Companies Web hosting companies own computers configured to be web servers. You will rent usage of one of these computers. There are basically three sizes of web host: Extra Large, Medium, and Tiny. There are pros and cons to each type. Extra Large hosts like Yahoo and GoDaddy tend to be the most reliable. They also tend to have little tech support (of any value), but have low prices for simple websites, and high prices for websites with software apps like an ecommerce shopping cart. Medium-size webshosts also tend to be very reliable. They sometimes have better customer service. What they really excel in is supporting your adding content management software to your site. Generally, they can help you configure the software. The downside is that they cost a little more than Extra Large. Tiny web hosts tend to be a few computers in a data room somewhere. They can have pretty high levels of reliability, but it depends on the company. They generally are run by 1-5 people, and don't have 24/7 tech support. They may also have server downtime. On the other hand, they are more likely to provide better advice because they are likely to be operated by content developers or computer programmers. Also, tiny hosts tend to focus on specific niches, and will update and support the software. So, if you do crafts, go with a crafts host. If you do politics, go with a politics host. Whatever you do, if you find a web host who specializes in the niche, use them. They'll set you up with the right web software, and may know some designers. ## A Web Site Some people think that getting a web host means that you'll also get a "web site," but that's not how it works. The web host is like a landlord, and is just renting you the space — it's up to you to fill the space. You fill it with wour web site — a collection of pages (and potentially, some software). You can pay someone to design and implement your site, or you can do it yourself. I tend to think everyone should start off by downloading a copy of NVU, and then try to make a site. Even if you don't end up doing it yourself, the few hours you put in cobbling something together will help you work with your designer. NVU is okay. It's not great, like Dreamweaver, but, it's FREE. ## Ecommerce When people talk about ecommerce, they usually mean a "shopping cart" and a catalog CMS. ## Search Engine Optimization I'm not an authority on SEO (as this site demonstrates with its low traffic) but there are a few things you should do to help your site rank higher. ### Add Content One of the ways Google ranks sites is by how "authoritative" the site is. Authority is measured by how many other sites link to your site. The more authority you have, the higher your site will appear on the search listings. The best way to gain authority is to write a little bit of content, and have people link to it. One way to do this, is to create a website within your website that has information about your field. If there's a Wiki about your subject, add yourself to it. If there's no wiki, maybe consider starting one. ### Get Links You should go to sites that would be interested in yours, and ask for a link. Make them link to your home page, if possible. If that's not possible, then, to a product page. You can also fool Google by creating a website elsewhere, and have it link to your site. This is invariably more work, though, because you have to also promote this other site. However, it might work if you're just starting out. It's usually cheap, because you can get a bunch of FREE websites from your ISP. They're usually included with your subscription. # Burned a DVD but it Won't Play in the DVD Player First, you have to assume that a given disc or file will not play on the target device. I know that sounds stupid. It doesn't make sense because “in the real world” you can buy a DVD and play it in a player. The problem is that the computer sphere is different, and there's no guarantee. The DVD sphere is dominated by an institute called Fraunhofer, which operates a consortium called MPEG. So all DVD players conform to MPEG. Fraunhofer plus associated companies are a kind of cartel. The computer market for video is not a cartel. There are competing companies: Apple, Microsoft, Real Networks, Fraunhofer, Sorenson, and a bunch of other companies I don't know. There are also companies like Adobe, Sony, Canon, and chipmakers like Zoran. There are open source projects like Ogg. Each company has one or more “codecs” or encoders and encoding schemes. These encoding schemes are mutually incompatible with each other. Each company wants to monopolize some video, whether it's online, broadcast, or whatever. They want to dominate a niche, and grow it to overtake the larger market (and become the next Fraunhofer, licensing products to their competitors). Also, popular encoders like MPEG-2 (a Fraunhofer encoding) have multiple vendors. Each vendor pays a little money to Fraunhofer for the technology, but they make their products independently. Again, each codec company wants to position itself to dominate a niche. Presumably, they each wish to sell out to a larger company with a bigger market, and be the encoder for that larger niche. (I may be wrong here.) Each big company, like Apple or Microsoft, has an interest in playing (almost) everything, but producing video that plays well only on their own technology, but not the competing technology. They make a concession to the MPEG and generally will be able to produce something that might work with DVD players. It's somewhat like the problem with competing mobile phone carriers : a few large players competing against each other create a wide diversity of small incompatibilities to get customers to switch out of frustration. The way I've dealt with the problem of competing systems is this: To extract DVD video, copy the DVD video files to a hard disk, and use AviDeMux. It reads many formats, including the VOB files on DVDs. On Macs, use Apple's Quicktime. On Windows, use Windows Media. On Linux, use MJPEG. Edit the video, and the produce a video file using Apple, Windows Media, or MJPEG (or Ogg), respectively. On Windows, I make a huge WMV file at the best quality. Then author a DVD using DVDFlick, an open source product based on FFMPEG, to produce DVD files. FFMPEG seems to read a lot of formats, and produces an MPEG-2 file that works with DVD players. It seems to work better than the Apple and Windows options. Then use Nero or Sonic or another DVD burning software to write out the DVD. Then, for each burned copy, watch it on a DVD player, not the computer, to make sure it plays. The first copy gets watched all the way through to look for problems that cause playback to fail. # C# and VB.net Comparison This is a link to an awesome resource: a "rosetta stone" comparing VB and C#. Totally useful as a general reference, too. One of these days, we need columns for PHP, D, C++, Javascript, etc. http://www.harding.edu/fmccown/vbnet_csharp_comparison.html # Careful File Copy This area of the site was getting really chatty, so I've removed it from the Software book, and moved it under DIY notes. The remainder of the experience will be put on the blog. Consider all the material on this page obsolete. ## Prologue I was asked to help migrate a large batch of ArcMac GIS files to a new server. The problems: the files contain references to other files, and all those files must also be copied. Also, these files are mixed in with other files, not pertinent to GIS, on a single server. To manage growth, it's necessary to move the GIS files out. Also, it's not a simple file copy. ArcMap can store the references in absolute or relative form. At this office they were stored as absolute paths, because that's more reliable. Thus, it's necessary to flip this bit to "relative," copy it over, and then re-flip it back to "absolute." Due to the large number of files, and the slow speed of ArcMap, I decided to try and script the process. This sub-site details some of what I've learned in the process. For a little more info: http://www.acadweb.wwu.edu/gis/tutorials/ArcMap_File_Mgmt.htm ## Part 1 These are the product of the first iteration of this project. It succeeded in some ways, but failed in other ways. It succeded in processing around 150 files before it crashed and failed to process the remaning 1,400 or so. I concluded that a longer-term project was feasible, due to the tedious and slow nature of this task. (That is, the tedium of copying files exceeded the tedium of reading hundreds of pages of VB help docs, which use "enterprise"-style code examples that usually don't do anything useful, as presented.) ### The code Taken together, these scripts form an almost-functional system. Some of the scripts are installed in an Excel document, and others are installed into the Normal.mxt template in ArcMap. The perl file copier script should be run as a scheduled task. It runs under ActivePerl. The big problems, so far: VBA doesn't handle OLE server timeouts well; ArcMap chokes on some files; the scripts use the IMxDocument interface instead of the IMapDocument, which might be faster; the scripts pause for one minute while the mxd file loads, instead of polling the app to see if the file is loaded. The small problems, so far: it'd be better to have the file copier written in VBA; the file format for the manifest files (generated by a script from the esri dev site) should be written for computer processing; using Excel as the process db is kind of cheesy. Being a noob, I didn't realize that Microsoft's idea of "Automation" was not very thorough. OLE automation, as implemented with Excel and ArcMap, isn't stable enough to do real batch processing. With VB (not VBA) driving ArcMap, I suspect it's possible, but ArcMap will still not provide good error handling. The code below has the following interesting features: • a process find-and-kill function • a recursive file system scanner • an excel-based tool to apply a macro to many files • a little error handling • mxd file analysis ### References The LayerSourceArray code from the ESRI dev site. http://search.cpan.org/~jdb/libwin32-0.26/OLE/lib/Win32/OLE.pm about using multiple interfaces via scripting (you don't) http://support.microsoft.com/kb/193247/en-us?spid=2513&sid=946 http://www.microsoft.com/technet/scriptcenter/guide/sas_wmi_jgfx.mspx?mfr=true AttachmentSize sapphos.bas.txt1.29 KB filecopy.pl.txt6.1 KB FileBatcher.cls.txt2.82 KB FileSystemScanner.cls.txt1.56 KB main.bas.txt4.84 KB # More VBA Sample Code Here's some more code to use.  Sub test() Dim pDoc As IDocument Dim pApp As IApplication Set pDoc = New MxDocument Set pApp = pDoc.Parent pApp.Visible = True pApp.OpenDocument ("G:\1217\1217-014\GISFiles\SEIFiles\ArcGISProjects\FieldTransects2.mxd") pApp.RefreshWindow End Sub Sub setRelativePaths() Dim pMxDoc As IMxDocument Set pMxDoc = ThisDocument pMxDoc.RelativePaths = True End Sub #! perl use strict; use Win32::OLE qw(in with); use Win32::OLE::Const 'ESRI ArcMapUI Object Library'; use Data::Dumper; # my$class = 'esriCarto.IMapDocument';
# my $class = 'esriArcMap.Application'; # my$class = 'esriFramework.IApplication';
# 'esriArcMapUI.MxDocument'

# print Dumper( Win32::OLE::Const->Load('ESRI ArcMapUI Object Library') );

my $pDoc = Win32::OLE->new( 'esriArcMapUI.MxDocument', 'Shutdown' ); # || die Win32::OLE->LastError()." no$class";

print Dumper( $pDoc ); my$pApp = $pDoc->Parent();$pApp->{Visible} = 1;

print Dumper( $pApp );$pApp->Shutdown();

exit;

$pApp->Visible = 1;$pApp->OpenDocument( '' );

Private Sub test()
Dim pDoc As IDocument
Dim pMxDoc As IMxDocument
Dim pApp As esriFramework.IApplication
Dim pDocDS As IDocumentDatasets
Dim pEnumDS As IEnumDataset
Dim pDS As IDataset
Dim pWS As IWorkspace

' get a ref to a new ArcMap application
Set pDoc = New MxDocument
Set pApp = pDoc.Parent

' Loop thru your .mxd documents here

' Open an existing document
pApp.OpenDocument "c:\MyMap.mxd"
Set pMxDoc = pApp.Document

' Iterate thru the datasets and display details
Set pDocDS = pMxDoc
Set pEnumDS = pDocDS.Datasets
Set pDS = pEnumDS.Next
While Not pDS Is Nothing

On Error Resume Next
Set pWS = pDS.Workspace
If Err.Number = 0 Then
Debug.Print pDS.Workspace.PathName + " : " + pDS.Name
Else
Debug.Print pDS.BrowseName + " : Error with datasource"
End If
On Error GoTo 0

Set pDS = pEnumDS.Next
Wend

' End of you loop

' Shut down the ArcMap application
pApp.Shutdown

End Sub

--------------------------------------------------------------

Sub muliplemxds()

Dim sDir As String
Dim sFile As String
Dim DocPath As String
sDir = "C:\Myfolder\TestFolder\"
sFile = Dir(sDir & "*.mxd", vbNormal)

Do While sFile <> ""
DocPath = sDir & sFile
OpenMXDDoc DocPath

sFile = Dir
Loop

End Sub
Private Sub OpenMXDDoc(sFileName As String)
On Error Resume Next

Dim pDoc As IMapDocument
Set pDoc = New MapDocument

pDoc.Open sFileName

Documentation pDoc

pDoc.Close
Set pDoc = Nothing

End Sub
Private Sub Documentation(pMxDoc As IMapDocument)
Dim mapcount As Long, LayerCount As Long, text As String
text = ""
Dim pLayer As ILayer
Dim pFL As IFeatureLayer
Dim pRL As IRasterLayer
Dim pFC As IFeatureClass
Dim pDS As IDataset
Dim pMap As IMap
text = text & vbCrLf & pMxDoc.DocumentFilename
For mapcount = 0 To pMxDoc.mapcount - 1
Set pMap = pMxDoc.Map(mapcount)

For LayerCount = 0 To pMap.LayerCount - 1
Set pLayer = pMap.Layer(LayerCount)
If TypeOf pLayer Is IFeatureLayer Then
Set pFL = pLayer
Set pFC = pFL.FeatureClass
Set pDS = pFC
text = text & vbCrLf & pFC.AliasName & vbCrLf & pDS.BrowseName & vbCrLf & pDS.Workspace.PathName
ElseIf TypeOf pLayer Is IRasterLayer Then
Set pRL = pLayer
text = text & vbCrLf & pRL.FilePath
Else
text = text & vbCrLf & pLayer.name
End If
Next
Next
WriteToTextFile "C:\textfile.txt", text

End Sub
Sub WriteToTextFile(sFileName As String, text As String)
Dim fso
Set fso = CreateObject("Scripting.FileSystemObject")
'Set fso = New Scripting.FileSystemObject
Dim ts
'Create File if doesn't exist, if it does, append to the current File
Set ts = fso.OpenTextFile(sFileName, 8, True)
ts.WriteLine text

ts.Close
Set ts = Nothing
Set fso = Nothing

End Sub

-------------------------------

use Win32::OLE;
my $class = "esriGeoprocessing.GpDispatch.1"; my$gp = Win32::OLE->new($class) || die "Could not create a COM$class object";
$gp->{overwriteoutput} = 1; print$gp->{overwriteoutput};

-
----------------------------------------



# New Manifest Format

Note: this information is obsolete.

I'd written a parser to read those manifest files, but the format was kind of irregular, so I'm going to spend an hour making it more regular. That way, it's easier to parse, and safer to parse, ultimately. The new manifest format is CSV.

The first field is the type:
m means the mxd file
s means a shape, and implies that the related files must be figured out later
f means a file

The second field is the name.
The third field is the filename.

We'll assume that no stray quotes or commas are in the data.

m,,G:\blah
s,shape name,G:\blah
f,,G:\blah


Use perl/bin/ppm.bat to install Text-CSV-Simple to get a csv reader. Attached is some untested code that reads a manifest and returns a list of files, sort of. There's still an issue of getting all the different types of shape files. It's just not fully determined.

AttachmentSize

# OLE/ActiveX Scripting Notes

I'm still working on this. These are just notes, and I'm a noob.

The ESRI ArcObjects don't fully support scripting. They support some basic level of scripting, but they don't fully support scripting with via contemporary OLE Automation, which is what Perl and other languages use.

Historically, there are three phases of COM/OLE that should help explain this situation a little.

First is COM. COM is a way to factor applications into objects that can be used across languages. Normally, you're constrained by the language.

Second is DCOM or OLE. OLE, and later, Distributed COM allowed for the objects to be located on different computers, or within another application. You could issue a method call to a remote program. The technology to do this involved "interfaces". An interface, in this situation, is a lightweight object that communicates with a remote concrete class, aka, coclass. The interface presents a "local face" for the remote object. To access the objects, you "instantiate an interface." Complex objects typically implement several interfaces, and, to access such an object, you needed to instantiate each interface separately, and then set the instance to the object.

Dim foo as IFooThing
Set foo = CreateObject("Foo.FooThing")

Dim bar as IApplication
Set bar = foo


The first two lines set up an object called foo that is accessing Foo.FooThing via the IFooThing interface. The last two lines set up the bar object to also access Foo.FooThing, but via an IApplication interface.

Third is ActiveX and scripting. This is where we are today. Scripting requires a single interface to the entire object. ActiveX objects havea single interface to the entire object, called IDispatcher.

Historically, much of the ESRI applications are stuck back in the second period, where the objects lack an IDispatcher. Thus, ArcGIS apps are difficult to write using scripting tools that expect it.

The alternatives are to use VB for Applications, .NET, Java, C++, and VB.

I'm not certain if Python has support for COM interfaces. I believe it does, according to what some sites say.

# Part 2: a VB.NET Version of this Project

After a while, it became obvious that there was no way to drive the ArcMap application from Excel -- timeouts from errors wouldn't get handled, so bad runs would hang.

A real app could raise errors on timeouts, so, I had to learn VB OLE programming. Fortunately there's a free version of VB called VB Express Edition. It's a complete VB environment, that uses .NET. Unfortunately, there aren't references for the old VB classes included. .NET is, in parts, a bit more complex than VB - it's a victim of feature-itis. There are also fewer VB.NET tutorials out there.

Here's a diagram of the "new" system, which is, mostly, going to be an iteration of the "old"system.

The app is broken into three parts. One part manages a list of files. One part is a bunch of "scripts" that do the actual work of analyzing, copying, and deleting files. One part is a scheduler that will run the scripts only at specified times, so that it won't interrupt the normal workday.

## File Batching

This code fits into the larger goal of a project that will reliably run an application on a set of files, over the course of several nights.

The first thing I've written, so far, is something that will scan the file system for file names, to create a "batch". The batch is stored in a Microsoft Access .mdb file.

The coolest feature is that you don't need Access to run it. It creates the .mdb file from scratch, and inserts data into it.

Another cool feature is the call to System.IO.Directory.GetFiles. That does all the scanning that, in the original project, required custom code.

This is very alpha code, but, it might help someone out there.

### FileBatch.vb

Imports System.Data
Imports system.Data.SqlClient

Public Class FileBatch

Private Const StatusNone = 0
Private Const StatusProcessed = 1
Private Const StatusSkip = 2

Private Sub CreateNewDatabase(ByVal dbPath As String)
' delete the file first
If System.IO.File.Exists(dbPath) = True Then
System.IO.File.Delete(dbPath)
End If

dbCatalog.Create("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & dbPath)

objFirstTable.Name = "FileBatch"
objFirstTable.Keys.Append("PK_File", 1, "File")

dbCatalog.Tables.Append(objFirstTable)

'cleanup
dbCatalog = Nothing
objFirstTable = Nothing
End Sub

Public Function CreateBatch(ByVal dbPath As String, _
ByVal pathStart As String, _
ByVal ext As String, _
Optional ByVal statusBox As TextBox = Nothing)
Dim ar, element

CreateNewDatabase(dbPath)

If statusBox IsNot Nothing Then
statusBox.Text = "Scanning for *." & ext & " in " & pathStart & "."
statusBox.Refresh()
End If

ar = System.IO.Directory.GetFiles(pathStart, "*." & ext, IO.SearchOption.AllDirectories)

Dim cs
Dim conn As OleDb.OleDbConnection
Dim command As OleDb.OleDbCommand
Dim sql As String

cs = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & dbPath
conn = New OleDb.OleDbConnection(cs)
conn.Open()

For Each element In ar
sql = "INSERT INTO FileBatch (File,DestinationFile,Status,ProcessingDate,Comment) VALUES ('" _
& element & "','',0,'1/1/1899','')"
' Console.WriteLine(sql)
command = New OleDb.OleDbCommand()
With command
.Connection = conn
.CommandText = sql
.ExecuteNonQuery()
.Dispose()
End With
Next
conn.Close()

CreateBatch = 1
End Function

End Class


Here's the code that calls it (from a form button):

    Private Sub Button1_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles Button1.Click
Dim fb As FileBatch
fb = New FileBatch
fb.CreateBatch("C:\tmp\text.mdb", "C:\Documents and Settings\johnkuser\", "jpg", Me.StatusMessage)
Close()
End Sub


### References

http://www.4guysfromrolla.com/webtech/013101-1.2.shtml

AttachmentSize
Form1.vb.txt343 bytes
Form1.Designer.vb.txt2.24 KB
FileBatch.vb.txt2.49 KB
filebatcher.jpg8.02 KB

# Some COM and .NET Notes

This document explains some terminology used on other pags.

ActiveX
A technology layered on OLE that supports a method, IDispatch(), that executes method calls by name (by a string argument). IDispatch() solved the problem of scripting languages being late bound, and not able to handle multiple interfaces. ActiveX also covered other technical things, but the IDispatch feature is relevant to this topic.
Assemblies
A group of classes. The classes generally work together, and form a namespace. Analagous to a Java package. The .NET assemblies are analagous to the Java class libraries.
CLR, Common Language Runtime
A "virtual machine" that executes programs coded in CL, a platform-neutral assembly language produced by compilers. The CLR is also called a "managed environment" because the virtual machine takes care of many runtime issues like allocating memory.
COM - Component Object Model
Microsoft's object technology that allows code objects written in different languages to interact with each other. The idea was that you could instantiate an object written in C++ from within VB.
OLE - Object Linking and Embedding
A technology layered on COM that defined how independently running objects would interact with each other. One example is how code in MS Excel can execute a macro in MS Word.
Late Binding - Dynamic Typing
The type of an object is not known until it is used. This contrasts with early binding, or static typing, where you declare that an object is of a specific type, first, then use it. Early binding in the COM environment is used when you declare that an object uses a specific interface. That allows the compiler to check that your method calls conform to the interface.
Managed
See CLR. Managed code is any code that runs within the CLR. The execution is "managed" because the CLR takes care of things like memory allocation and threads.
Multiple Interfaces
The technique used by MS VB and COM to implement objects. An object implements an interface, and may implement more than one. To interact with the object, you instantiate the object with the specific interface, and that defines how you interact with it.

# Change Web Host Company Without Downtime (Linux or BSD oriented)

This outlines how to change web hosts with minimal downtime. It won't go step by step, or explain, too much.

I'm using Hurricane Electric as my web host. I've been happy with them, especially after trying other web hosts that didn't deliver the level of performance I needed. (I'm a return customer, having used them back in the 90s.)

Gather all passwords, making sure you can get into your accounts to manage domains, web server files, databases. Get these all in a single text file, for convenience.

Move the files over. If you can, use a tool like rsync, and run an rsync server on the originating server (or on the computer with a staging copy of the site). If you have shell access, you can write a script to sync the files. Here's a bit of my script to sync:

#! /bin/bash
# this syncs everything except images
# necessary because rsync dies when there are too many files
for fn in action_icons admin atom audio authors calendar
do
rsync -vr  rsync://zanon.slaptech.net/launionaflcio.org/docroot/${fn}/ public_html/$fn/
done


Move the database over. Again, if you have shell access, you can do this with a command like this:

#!/bin/bash
echo "getting database from remote"
ssh foobar@slaptech.net ./dump_mydb_org | mysql -pasdfsdf mydb


dump_mydb_org is a script that calls mysqldump with the correct username and password to dump mydb.

To get the application running on the new server, edit your local HOSTS file and create a line for your website. Typically, your website is hosted on www.mysite.com and mysite.com. What you will do is override the DNS, and create a new record for www.mysite.com. Set www.mysite.com to the new IP address.

In C:\WINDOWS\SYSTEM32\DRIVERS\ETC\HOSTS:

65.49.111.111 www.mysite.com


Going forward, www.mysite.com will point to the new server. You can now get the application working. Usually, that means altering configuration files so they can get to the new database.

If you have web stats, make sure they are working.

If you have to make updates to the old site, keep doing so, until Friday afternoon. On Friday afternoon, log into the new server, and run both sync scripts to sync the files and the database.

Then, alter the DNS records so your domains point to the new server.

In two to three days, nearly all the DNS records for your site will change over to the new server. On Monday morning, the new site should be getting all the traffic. From Friday evening to Monday morning, you should avoid updating the database. If you must, then, figure out some way to make sure you're only touching the new website and database. Maybe put a file on the new server and read it through the web browser.

On Monday, download the logs from the old site. Shut it down, but don't delete anything. You might need to switch back if the host turns out to suck.

# Changing Windows 2000 Professional to Windows 2000 Server

The main reason to do this is to allow more than 10 clients to connect to your computer. Aside from that, Win2k Pro doesn't come with all the applications and services that Win 2k Server includes.

Info stolen from: http://www.commodore.ca/news/2002/mar30_02.htm

Week Ending March 30, 2002

Change Windows 2000 Pro To Windows 2000 Server with Freeware Util
NTSwitch is a small freeware program that allows you to turn an existing NT Workstation or Windows 2000 Professional installation into an NT Server or a Windows 2000 Server environment.

It's well-known that Workstation and Server environments are virtually identical. The operating system decides which "flavor" to run in based on two registry values:

* HKLM\SYSTEM\CurrentControlSet\Control\ProductOptions - ProductType [REG_SZ]
* HKLM\SYSTEM\Setup - SystemPrefix [REG_BINARY 8 bytes]

ProductType is "ServerNT" or "LanmanNT" for servers, and "WinNT" for workstations. The third bit in the last byte of the SystemPrefix value is set for servers, and cleared for workstations.

Since the release of NT4, Microsoft has taken measures to keep the user from changing these registry values. The operating system has two watcher threads that revert any changes made to these two registry settings, as well as warn the user about "tampering".

The good guys at SYSInternals have supposedly created an application called NTTune. They did not release it to the public, but only to the press - their intent was to demonstrate the fact that there's really no difference between Server and Workstation. However, they did not make their utility publicly available. The application disabled the system threads thus letting the user change the aforementioned registry values.

The public is curious - people came up with a way of changing these settings without NTTune. Details are here. It involves hacking the NTOSKRNL.EXE executable so that the watchdogs are looking at some other registry setting. While this works, it's definitely not for the faint at heart.

Our utility, NTSwitch, is not as slick as NTTune - it does not disable the system threads. It's not as horrible as the NTOSKRNL.EXE hack either.

Our approach is the following:

* Backup the SYSTEM hive of the registry using the registry API.
* Edit the information contained in the backup file.
* Restore the registry from the backup.
* Reboot the computer so that the changes can take effect.

A quick-and-dirty hack. It works, and it's at least as safe as the two previous solutions. We're giving it away for free. Go here to download it. The readme.txt contained in the zip file might have some late-breaking information, be sure to read it.

# Cheapskate Developers Mobile Phone Tips

I feel lame when it comes to mobile phone hacking because I'm so far behind the state of the art, by at least five or more years. The only good thing about this is that, generally, only games have taken off on phones, leaving the universe of practical applications almost untouched. This is a newb article, so, if you're experience, go away :)

The problem with programming J2ME devices is that there are so many devices, and they generally support a different subset of J2ME features or APIs (called JSRs). I recently got an LG600g and found out that it's a junk phone for Java hacking. It doesn't support many JSRs.

I wanted to try writing code to extract data from the PIM. No such luck, because the PIM JSRs aren't supported on this phone. In fact, few JSR are supported. A little sleuthing revealed that this phone looks like an LG KP210.

LG documents the JSRs supported by the KP210 on their developer site, developer.lgmobile.com. There, you can download SDKs for LG J2ME development.

A similarly priced (cheap) phone called the Motorola i290 from Boost mobile is a different story. If you go to the Moto site, there is a complete SDK for the i290, and a lot of different JSRs are supported. Moto is very programmer-friendly, and they have an exemplary site.

Regardless, I'm stuck with the LG until the minutes run down. So, I'll program for that, for now. The limitations are a challenge.

# Comment Styles for C-Style Code

If there's anything that annoys people more than funky indentation, it's bad comments. I don't mean about the code, but in the code.

 function name() { /* Once upon a time, all my comments were inside the functions. */ } 
It seemed to makse sense, but there's something that sucks about having to scroll more to start reading code.
 /* Moving it up above the function seems to help! So i did this for a while. */ function name() { ...codehere.... { 
Lately, all the languages are getting automatic documentation generation. They use comments like this:
 /** * The code comments here get turned into web pages. * I like how there's a little extra whitespace above and below. * And the stars are a pain to keep adding, even with editor support. */ function name() { ...codehere... } 
But these docs look like really decent docs. In Perl:
 # # This function does nothing at all. # sub name { ...codehere... } 
Again, there's all that extra whitespace. It gives the eyes a break.

As you can see, the main trends here are to move the comments out of the function, and to add more visual cues in the comments.

# Compile ffmpeg from sources on Ubuntu

For some legal reasons, Ubuntu does not ship with some important features in ffmpeg enabled. It appears that support for faac AAC encoding is stripped. The older tutorials for building from source don't seem to work.

So, I went to the original sources.

I don't have the full instructions yet, but here are some tips.

Use the latest ffmpeg.

Use the latest x264.

Configure both without the --enable-shared flag.
Configure x264 with --enable-pic.
Configure ffmpet without the --disable-static flag.

I suspect there are better arrangements so that everything can make a shared library... but often you just need to get ffmpeg running.

# Connecting to Network Printers

Suppose you go onto a foreign network and need to print. There's no network administrator around. How do you install the printer?

First, go up to the printer and push the "menu" button. There are "up and down" buttons to flip through menu items. One of the items will be "print configuration" or something like that -- use that and print the configuration. (You can take this back to your computer, by the way.)

On the printout should be a section with the heading "network" or "tcp/ip" or something like that. Look for a line like "IP address".

On your computer, click "add printer". Depending on whether you're on a Mac or Windows, the setup process is different. (Since I don't have either here, I can't go into detail.)

On Windows, you need to call the printer a "Local Printer", not a network printer -- yes, it seems weird, but that's what you do. Set the port type to TCP/IP, and set the IP address to the printer's IP address.

On a Mac you can select the IP printer and specify the IP address.

At this point you can probably allow the computer to automatically choose a driver. It'll take a while, especially over the internet.

If you're impatient, you can do what I do, and before the entire process, you use Google to find the printer driver for the printer. Install it first. Then, set up the printer. (If it looks like a generic Postscript or PCL driver, you can force it to use the built-in drivers. There's a little risk that it won't work with cheaper printers, though.)

# Convert Text to HTML PHP Function

How many times has this wheel been reinvented? According to Google searches, not enough - because I couldn't find a good one. Over the years, I've built this wheel a few times, so, here goes again. This is a lot better than the stock nl2br() function.

The attached code and test files show it off, and only a description follows.

The text is converted by analyzing it line by line, and building up an array that contains metadata about the document. The metadata describes each line: is it long? Is it blank? Does it look like a quote? The metadata is analyzed to determine paragraph groupings.

This differs from the typical solution of using regular expressions to add HTML code to text. For one, we try not to manipulate the text in place. Rather, we simply "look at" the text, and "notice" features. Later, we analyze the features to determine what tags to insert.

This technique works well because a lot of formatting information is embedded in the layout of the text. By preserving the layout, we can guess what the formatter intended. Also, by allowing for multiple passes over the text, we can refine the metadata.

For example, we could detect if one of the first few lines contains a line that's capitalized like a title. If so, we can assume it's a title, and add that metadata. Then, we can quickly look one line below that and see if it looks like a byline, and if so, add that metadata.

What you detect depends on your data. This function's being written to convert text email messages into HTML, for easier reading on small screens, so bylines and title aren't that important, but getting quoted text right is.

I've left linking to another function, and escaping characters to htmlspecialchars().

The paradump() function is not related to all this - it's just a way to view the text alongside the metadata.

AttachmentSize
index.php.txt3.31 KB
test.txt1.45 KB
text.txt916 bytes

# Convert Web Pages into Kindle "Books" (Documents)

This script below will accept a URL parameter, download the HTML, convert it to a .mobi file with kindlegen, and copy the file onto your Kindle. It works on Ubuntu, but can be altered to work in your environment. It's written in Perl, and requires kindlegen and wget. You can get kindlegen from Amazon's website, and wget is in your repository.

The only "trick" it has is reading the document's title, and using that as the document's filename. That should help avoid problems with files overwriting each other, to some extent.

$KINDLE is the documents directory on your Kindle. If you're using another Linux distro, it might appear in another directory.$KINDLEGEN is the path to the kindlegen command.

#! /usr/bin/perl

$KINDLE = '/media/Kindle/documents';$KINDLEGEN = '/home/johnk/bin/kindlegen';

use File::Copy;

$url =$ARGV[0];

system("/usr/bin/wget -O /tmp/kindle.html $url"); open FH, '</tmp/kindle.html'; @lines = <FH>; close FH; @titles = grep {$_ =~ /<title>/i } @lines;
$titles[0] =~ m#.*<title>(.+)</title>.*#i;$text = $1;$text = 'index' if (! $text); print "title is$text\n";

$text =~ s/[^a-zA-Z0-9 ]//g;$text =~ s/\s/-/g;
$text = lc($text);
$text = substr$text,0,30;

$filename =$text.'.html';
$mobifilename =$text.'.mobi';
print "filename is $filename\n"; print "mobifilename is$mobifilename\n";

rename("/tmp/kindle.html", "/tmp/$filename"); system("$KINDLEGEN /tmp/$filename"); copy("/tmp/$mobifilename", "$KINDLE/$mobifilename") or die "Copy failed: $!";  # Converting Time or Datetime to UTC in Python This seems so basic, it's almost embarrassing to publish, but this showed up a few times on Stackexchange. I had trouble figuring it out, too, partly because the Python docs are so lengthy. The scenario: you have a textual timestamp with a timezone, and need to convert it to UTC. I had the former in and email date header, and needed it printed in UTC for the envelope. The input looks like this: Tue, 17 Mar 2009 18:57:55 -0300  from dateutil.parser import parse from datetime import timezone, datetime, timedelta t = parse("Tue, 17 Mar 2009 18:57:55 -0300") ## first way tu = t.astimezone(timezone.utc) ## second way tu = datetime.utcfromtimestamp(t.timestamp()) ## and as text text = tu.strftime("%c")  The main difference between the first and second ways is that the first way strips off the timezone info (tzinfo), so it become a "naive" datetime. The second way sets the tzinfo to utc, which is better. Also, the Date lines in email headers varies, and taking a slice of [6:37] will chop off timezone markings like "(GMT-08:00)" that cause the parser to barf. # Create SQL tables from CSV headers Not sure where this goes, but it's a page that will generate MySQL code from the header line from a CSV file. # Debian Exim: how to Whitelist a host or IP that is in a blacklist Exim4's docs need some work, especially the split config. First, you need to make a new config file /etc/exim4/conf.d/main/000_localmacros Then, in the file: MAIN_RELAY_NETS = 68.142.199.0/24  Or whatever networks you need to allow to relay. Then /etc/init.d/exim4 reload # Deleting a Windows User You Can't See Windows XP's Users control panel doesn't show all the users. I had to delete the "postgres" user to reinstall postgresql on my comp. To do this, I had to run this command: NET USER /DELETE postgres To see all the users type NET USER. # Email Obfuscation and Shielding Script Here's a perl script that takes email addresses as arguments, and returns javascript code that hides your email address from web spiders. The email address is also linked so it's clickable. #! /usr/bin/perl foreach my$email (@ARGV) {

$email =~ s/@/ @ /;$email =~ s/\./ . /;

@parts = split( ' ', $email ); print "<script type='text/javascript'>\n"; print "document.write('<a href=\"mailto:');\n"; foreach my$word (@parts) {
print "document.write('".$word."');\n"; } print "document.write('\">');"; foreach my$word (@parts) {

## Other

To have a firewall running, enable the network netfilter in the kernel, then:

emerge netfilter iptables


# Gentoo: My Configuration

The notebook computer says:

free ~ # smartctl --health /dev/hda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct   0x0033   001   001   050    Pre-fail  Always   FAILING_NOW 1023


Yikes! That's the first time I've seen FAILING_NOW. No wonder I get error messages.

The good news is that virtually all my data is on servers. Email is IMAP, bookmarks are in Foxmarks, music and other stuff is on a desktop machine, code is in SVN. The only thing not saved is my system configuration, which is a time-consuming task. So, here goes:

free ~ # cat /var/lib/portage/world
app-editors/nvu
app-editors/vim
app-misc/workrave
app-office/abiword
app-office/gnumeric
app-text/gsview
app-xemacs/emerge
dev-db/mysql
dev-lang/php
dev-util/subversion
kde-base/arts
kde-base/kcalc
kde-base/kdebase-meta
kde-base/kdemultimedia
kde-base/kdm
kde-base/kwifimanager
mail-client/mozilla-thunderbird-bin
mail-mta/postfix
media-fonts/corefonts
media-fonts/font-bh-type1
media-fonts/font-bitstream-type1
media-fonts/font-misc-misc
media-fonts/font-sun-misc
media-fonts/gnu-gs-fonts-other
media-fonts/lfpfonts-fix
media-fonts/terminus-font
media-gfx/gimp
media-libs/alsa-oss
media-libs/giflib
media-libs/libogg
media-sound/alsa-tools
media-sound/alsa-utils
media-sound/lame
media-video/ffmpeg
media-video/mplayer
media-video/xine-ui
net-firewall/iptables
net-fs/nfs-utils
net-im/gaim
net-misc/dhcpcd
net-misc/telnet-bsd
net-misc/vnc
net-wireless/ipw2200-firmware
net-wireless/wireless-tools
net-wireless/wpa_supplicant
net-www/netscape-flash
sys-apps/dbus
sys-apps/hal
sys-apps/pciutils
sys-apps/pcmciautils
sys-apps/slocate
sys-apps/smartmontools
sys-boot/grub
sys-kernel/genkernel
sys-kernel/gentoo-sources
sys-process/at
sys-process/vixie-cron
virtual/ghostscript
www-client/mozilla-firefox-bin
www-servers/lighttpd
x11-base/xorg-server
x11-libs/cairo
x11-libs/qt
x11-libs/qt:3
x11-misc/x11vnc
x11-themes/gtk-engines
x11-themes/gtk-engines-qt

free ~ # cat /etc/make.conf
# These settings were set by the catalyst build script that automatically built this stage
# Please consult /etc/make.conf.example for a more detailed example
CFLAGS="-O2 -march=i686 -pipe"
CHOST="i686-pc-linux-gnu"
CXXFLAGS="${CFLAGS}" GENTOO_MIRRORS="ftp://distro.ibiblio.org/pub/linux/distributions/gentoo/ " USE="X x11 kde qt4 qt3support qt3 dbus hal mysql mysqli ctype pcre session unicode cgi jpeg png alsa acpi firefox gtk lame ogg mp3 mpeg nas ncurses" # note, cgi is for php :( free ~ # cat /etc/lighttpd/phpmyadmin.conf alias.url += ( "/phpmyadmin/" => "/usr/share/webapps/phpmyadmin/2.11.1.2/htdocs/" ) free default # pwd ; ls -l /etc/runlevels/default total 0 lrwxrwxrwx 1 root root 20 Nov 25 15:50 lighttpd -> /etc/init.d/lighttpd lrwxrwxrwx 1 root root 17 Nov 23 12:32 local -> /etc/init.d/local lrwxrwxrwx 1 root root 17 Nov 25 15:48 mysql -> /etc/init.d/mysql lrwxrwxrwx 1 root root 20 Nov 23 14:06 net.eth0 -> /etc/init.d/net.eth0 lrwxrwxrwx 1 root root 20 Nov 23 12:32 netmount -> /etc/init.d/netmount lrwxrwxrwx 1 root root 15 Dec 7 02:51 nfs -> /etc/init.d/nfs lrwxrwxrwx 1 root root 19 Nov 25 15:49 postfix -> /etc/init.d/postfix lrwxrwxrwx 1 root root 17 Nov 23 13:59 sshd -> ../../init.d/sshd lrwxrwxrwx 1 root root 21 Nov 23 14:10 syslog-ng -> /etc/init.d/syslog-ng lrwxrwxrwx 1 root root 22 Nov 23 14:11 vixie-cron -> /etc/init.d/vixie-cron lrwxrwxrwx 1 root root 15 Nov 24 13:49 xdm -> /etc/init.d/xdm  That's the basic setup. lighttpd needs to be tweaked, and the phpmyadmin config file needs to be created. The kernel config file, /usr/src/linux/.config, is attached. It's a work in progress, as usual. AttachmentSize config48.04 KB # Getting Blackberry Desktop to Work with MS Outlook For some reason, the Blackberry Desktop software wasn't showing Microsoft Outlook as one of the PIMs to synchronize with. I tried versions 4.3 4.5 and 4.6, with identical results. Intellisync's configuration wizard would appear, and you'd see the Contacts, Calendar and other options. When you click Setup..., a dialog box would open and you'd see only Yahoo mail and Text file exporters. The solution was to install the software using the "Work Email Address" setting, which is designed to be used with Blackberry Enterprise Services. Do this even if you don't have BES. During the setup, select BES for MS Exchange. Make sure that MS Outlook is set as your default mail and address book provider. This is in the Control Panel's Internet Options, under the Programs tab. For some reason, this will cause everything to work. Everything should default to syncing with MS Outlook. It will also cause Blackbery Desktop Redirector to also start up. This may or may not cause problems. It's hard to tell. Also, for some other odd reason, this solution isn't presented on the forums, but was found on a Zimbra website. # HTML CSS 3-column Layout with Content Above the Navigation I was toying with some SEO ideas, and wanted a CSS-based layout that puts the content at the top. After doing so much PITA CSS for a year ror two, and then not doing it for a couple years, it suddenly got really easy to make this layout. Maybe the CSS concepts just take time to sink in. It seemed to make more sense as I forgot the language. Attached is a 3 column layout with two nav bars, a sidebar column, a footer, and a header. The content is right at the top, and all the navigation is between the content and the footer. The layout is fixed, not liquid, because liquid and wide-screen don't mix. The code is a skeleton, not a functioning layout with all the elements in place. There's no CSS to turn lists into links, for example. CSS Tricks has more information and a couple tricks to make this kind of layout work. AttachmentSize index.php.txt1.17 KB # Haskell Learning Notes A couple years ago I tried to learn Haskell and dropped the study. I'm not sure what happened, but it's really hard to find a solid block of time to study it. Haskell syntax is so different from other languages that it's difficult to pick up. So I'm writing these notes as a kind of alternative study to the (good) tutorial "Haskell for C Programmers" by Eric Etheridge. Some syntax is simple. Numbers: 1 2 3 Lists: [1, 2, 3] Strings: "This is a string" Tuples, which are kind of like C structs or Pascal records: ( 1, "name", 5.0 ) The difference between tuples and lists is that tuples are always the same length, but can be different types. Lists are any length, but all the same type. They have totally different uses in Haskell. Haskell is a functional language, meaning that functions are the common way to break a program down into smaller parts. Where Haskell functions differ from function definitions in languages like C or JavaScript is that Haskell functions are descriptive more than procedural. Starting here, all code is code as it appears in a source code file. These files end in ".hs" by the way. To use the code, you load it into Hugs98, and then you can call the code from the command prompt. The command prompt is indicated with "Main>". Functions are defined with the = sign: foo x = [ 1, 2 ] Now, this is a nonsense function. It's called like this: Main> foo 4 [ 1, 2 ] It always returns the same values, [ 1, 2 ]. The function name is "foo". The argument is called "x". The return value is always the list [ 1, 2 ]. Here's an equally foolish function: foo x = [ 1, 2, x ] Call it like this: Main> foo 9 [1, 2, 9] It substitutes the value of x for the last item in the list. Here's a more useful function: square x = x * x Main> square 9 81 Yet more useful functions: tax p = p * 0.0975 tip p = p * 0.20 totalTab p = p + (tax p) + (tip p) Filters While the popular tutorial starts with the example of a function that generates the list of Fibonacci numbers, I will do somthing simpler: filters on lists. Here's a function that takes a list as an argument and returns the entire list. This is not a filter :) notafilter lst = [ x | x <- lst ] This is a list comprehension -- a statement that describes a list. This list is [ x ], where x is taken from lst (which is the argument to the function). The thing on the left of | is the an expression, and the thing on the right describes the element. In this function, the expression is just x. It could be a more complex expression. Here's a function that will filter in all the even numbers in a list of numbers: evensFilter lst = [ x | x <- lst, mod x 2 == 0 ] The expression after the comma (,) is a conditional. If its value is true, the element is included in the output list. "mod x 2" is x modulus 2. Even numbers evaluate to 0. == is the comparison operator. Here's a function that adds an "s" to each string: pluralize lst = [ x ++ "s" | x <- lst ] Main> pluralize [ "cat" , "dog" ] ["cats","dogs"] And another one that turns verbs into nouns. Some verbs that is: gerundize lst = [ x ++ "ing" | x <- lst ] Main> gerundize [ "park", "crash", "turn", "run", "smoke" ] ["parking","crashing","turning","runing","smokeing"] OK, so it's not that clever, but it's not bad for a one-line program. And, finally, because this tutorial is on the web, here's a little html tag writing code: blink str = tag "blink" str tag t s = "<" ++ t ++ ">" ++ s ++ "</" ++ t ++ ">" And a few more: p s = tag "p" s h1 s = tag "h1" s h2 s = tag "h2" s strong s = tag "strong" s em s = tag "em" s br = "<br />" Main> p ("foo" ++ em "bar" ++ "baz") "<p>foo<em>bar</em>baz</p>" The parens set the order of operations. This example might be useful. FYI, Haskell syntax notes. # Haskell Notes 2 I found a good tutorial at Wikibooks, Haskell. It's beginner level like these notes, but is way more organized. My notes here are more difficult to comprehend (due to lack of editing), but the examples are simple enough for some people to understand. One thing I like about that book is that they start out without using type signatures. All the other tutorials use type signatures, even though they aren't required. They're really good form, but can get in the way of learning quickly. Here's an example that converts a list of strings into a JSON list of strings. (Sort of - I don't know how to insert double quotes.) jsonList lst = "[" ++ ( jsonListJoin lst ) ++ "]" jsonListJoin [] = "" jsonListJoin (x:[]) = "'" ++ x ++ "'" jsonListJoin (x:xs) = "'" ++ x ++ "'," ++ jsonListJoin xs This defines three versions of jsonListJoin, and the correct one is dispatched by pattern matching. The first one never gets called normally, but it's in there just in case. The second version matches the end of the list, where you have one element followed by the null. It's just like the final version except without a comma after this argument, and without the recursive call to jsonListJoin. The third version is the most general version, and it matches any situation where there's a list with two or more elements. The first item is taken and turned into a JSON string, and the remainder of the list is passed to jsonListJoin. There's a comma in this version. # How to Keep Your Notebook Running Speedy This is an addendum to the two articles about keeping Windows XP speedy. This article discusses a few issues relevant mostly to laptop computers. ## Check the Disk Portable computers are more likely to have disk problems than stationary computers. Running a disk scan gives the built-in hard-disk repair features a chance to operate. Right click on the C: drive icon. Select Tools, and click Check Disk. Check off the option to fix bad sectors. It'll inform you that you cannot check the disk, but when you restart the computer, it can run the disk checking software. ## Scan for viruses more often Laptops often end up connected to different networks. Each connection is an opportunity for infection. Scan after you travel with the computer into a foreign network. ## Add Memory Generally, notebooks start out "behind" in the RAM game, and as the updates accumulate, you hit the "wall" and start using virtual memory. That means tapping the hard disk, which, as noted above, is likely to have errors. Also, the hard drives tend to be a little slower. To reduce memory usage, review the other articles. Use MSCONFIG to alter what programs are being run at startup. # How to Stay Virus Free with Windows XP, the Bare Minimum 1. Get some kind of anti-virus software. Consumer Reports recommends PC-Cillin, which is cheap and doesn't bog the system down. 2. Start using Mozilla Firefox. It's attacked less often than Internet Explorer. 3. Avoid clicking on attachments. Avoid using MySpace. Avoid Yahoo Instant Messenger. 4. Get a copy of the Ultimate Boot CD for Windows, and learn to use it to clean the system of most viruses. What UBCD doesn't catch, the other antivirus software should catch. 5. Get a firewall/router. The one I like is the Linksys WRT54G, but any kind is fine. A hardware firewall will add some security by being a little harder to hack than a computer with firewall software. (You should still run the firewall software.) 6. Set up an extra user with limited access. Use this as your main account, dropping into the administrator (or computer_owner) account to install software. # How to Stay Virus-Free and Speedy with Windows XP Every couple of months, someone asks me how to get their computer to go faster. Usually, they're relatively new to computers, and while they get around pretty well on the internet and know how to use their system, they don't always understand how to avoid being attacked by viruses or other "malware", or how to manage their system so it runs fast. (Thanks to CSH, DKL, ECC, REG, and CEG for putting me to work dealing with these annoying computer issues. Also, thanks to BG of MS for operating the company that created this thing called Windows. Without them, this page wouldn't exist.) ## Good Habits Comfortable computer use is achieved by practicing good habits, and avoiding bad habits. Bad habits lead to pain. Everyone has some bad habits, and everyone will experience some pain, and I am no exception. I've been hit by viruses, had computers "cracked", and have lost data due to negligence. However, I've also managed to recover from most of these situations relatively unscathed. This is a lengthy list of good habits. It's best to try each one out for a while, individually, and learn to integrate the good habits into regular use. Good habits are hard to attain (just ask my doctor), so don't criticize yourself too much if you can't do all these things. It's just important that you try. ## Three Types of Users: Administrator, Power User, Regular User When you set up XP, it asked you to create a name and password for the computer owner (that's you). This is the Administrator account. You should not use the administrator account day-to-day. XP also asked you to create a Power User, to use the computer regularly. You should also not use the Power User day to day. Instead, you should create a third user, who is a regular user. A regular user is restricted from installing new software and hardware on the computer. This includes "plug ins" or "active x controls" on websites. You should use the regular user account as your main account. Very quickly, you'll notice that web pages, and some emails, ask you to install software. When this happens, you should click on the Start Menu, click "Log Off", click "Switch User", and then log in as the power user. Then, you can go back to the website, and install the software. Personally, I tend to use the power user account, but novices should use the regular user because it forces you to learn about all the situations when software is trying to execute. (It doesn't happen just anywhere.) After a while, you'll figure out situations when you're likely to be asked to install software, and then make a conscious decision about whether it's worth it or not. ## Use "Add or Remove Programs" This is advice for people who install or "try out" a lot of software. If you don't do that, skip this section. The Add or Remove Programs tool in the Control Panel should be used once every couple of months to remove any old software you're not using. Some programs cause the startup and shutdown sequence to launch other programs "in the background". These are programs that don't show up in the task bar, but do show up in the "Task Manager" application, under the "Processes" tab. (To use the Task Manager, right click in the task bar, and it's one of the menu options.) These "background" programs consume some memory, and use some processor time. They're designed to be sparing with their usage, but, when you have dozens of programs installed, they tend to add up. ## Don't Install It Don't install the customized cursors, Weather Bug, screensavers, or browser toolbars. I know you want to do it, but, some of these things are "spyware" and consume processor resources. They may also "spy" on your web surfing and keystrokes, and send the information to a database. That database is a big list of "suckers" or "easy marks" -- people who are willing to install software, and spend money online, without much concern for security. These online marketers will turn your personal information into ad campaigns directed at you, to take your money. If you have installed it, you can try to uninstall it by referring to the previous section. If that doesn't work, read on. ## Reinstall Windows Occasionally You should save all your data (see Backups below), and erase the hard disk, and re-install Windows every two years or so. This will wipe out all the junk. To do this, you need to do a little planning, and make sure you have all your information in order: 1. Make sure you have CDs for all your software, including the CD Keys. 2. If you don't know the CD Key, you can usually go into the Help->About This Program menu item, and find it displayed there. 1. Make sure to write down your usernames and passwords you've stored on the computer. 2. They might be inside your browser. If you're using Mozilla, you can go to Tools->Options...->Security tab, and there's a button to view your passwords. 1. You can back up your various settings by going into My Computer -> C: drive -> Documents and Settings -> your user name. 2. Then, go into the menu Tools -> Folder Options -> View tab, and select "Show hidden files and folders". 3. You can then see the Application Settings folder, and copy it to backup media. If you have enough spare disk space, on a second disk, you should keep all your installers, especially the ones you download from the Internet. Now would be a good time to go and download the latest versions of your favorite software. Finally, you can re-install Windows, or run the restore CD, and clean out your system. ## Keep a Software Library Get some large envelopes and some magazine storage boxes, and put your CDs and software license certificates in there. It'll take up some space, but, you need that information to reinstall your software (or to sell it). ## Use MSConfig to Disable Annoying Startup Junk Being somewhat inexperienced with Windows, I didn't know about this useful tool. It allows you prevent startup programs from running. To use it, press Window-Key-R or Start->Run.... Type "msconfig", and Enter. Each tab shows you a little bit of the startup sequence. The most crowded area is the last tab, where apps like iTunes and Real Player install tiny programs that check for updates. They suck up some resources, and when there are enough of them, things can get slow. Flip them off by unchecking them. Of course, you can't just flip everything off, but if something goes wrong with one, you can turn it back on. ## Scan for Viruses I tend to not run any virus detection software. Instead, I just go to McAfee and run a free scan there. This is a way to check that my habits are working. I check around once every three months, but more often on new systems. Symantec.com also has a free scan. If you have a virus, you should probably buy one of the products to disinfect yourself. The new hot product is Kaspersky's virus scanner. They don't have a free version, but they do have trial versions. ## Use Firewalls Windows XP comes with a firewall, and for starters, you should use that. To set it up, go to the Control Panel, then Security Center, then scroll down the window to the Windows Firewall icon. Make sure it's ON. Then, look at the Exceptions tab. That lists programs that are set up to listen for incoming internet connections. You should disable some of them a couple of times a year, just to see what happens (or see what fails to work anymore). If you're on Windows 2000, you should definitely use a software firewall like Zone Alarm, or, my fave so far, Outpost Free. These programs have more features than the regular Windows XP firewall, but basically do the same thing. They also give you a nice overview of what traffic is active on your computer. If you are using a DSL or cable modem service, you should also get a router. These are devices that are designed to allow more than one computer to connect to the high-speed line. They also include a simple firewall. By using one of these, you add an extra layer of security to your network. The only negative aspect of using a router/firewall device is that some applications, like some kinds of file transfer over peer-to-peer networks, will fail, or become difficult to set up, because you have to mess with the firewall first. (If you want life to be a little easier, get one that features UPnP, or is called a "gaming router".) ## Use Good Passwords I once worked at a company that used really weak passwords. That was the first place I experienced a computer break-in. It sucked. A good password has a combination of words, numbers, upper and lowercase letters, and maybe some punctuation. A good password can also be very long, like a complete sentence. "5TT3err%" is a good password. "ohmanmyfingerhurts" is a good password. "password", "admin", and "ucla" are bad passwords. There's a password quality evaluator elswhere on this site. ## Run Backups Regularly It's critical to have a good backup system. The computer or hard drive will fail, eventually. There are two important aspects to doing backups painlessly: organizing your data, and organizing your backups. First, you need to organize your folders, so all your data is in one place. Windows wants you to put everything in "My Documents" I suggest using it. Within My Documents, create a filing system of folders within folders, to organize your documents and/or work. You might make one folder per client, or one per project, or organize files by the type of file. Personally, I tend to keep one folder per client, and put projects within it. Once your files are in one place, and organized, it's not that hard to plan a backup. I could go on at length about backup strategies -- entire books have been written about it. The basic, simple strategy is to buy enough extra, external storage for all your files, and run a backup at least a couple times a year. If you have data that changes a lot, back that up every week or so. There's a lot of software out there to help with this, and that might be discussed on another page. There's also a built-in backup tool, under Start Menu -> All Programs -> Accessories -> System Tools -> Backup, that performs different types of backups. I don't use it, but it's there if you wish to implement a more rigorous backup system. ## Don't Click on Attachments If you don't know why someone's sent you an attachment, don't open it with the double-click. Instead, save the attachment to the Desktop, and open it with the appropriate application, or with Notepad. Also, don't use Outlook Express (or Outlook if you can avoid it). Those are the most attacked programs. ## Use Mozilla Use Mozilla Firefox and Mozilla Thunderbird. They are a bit more secure than Internet Explorer and Outlook Express. This may change, as they get more popular, but today, the Mozilla programs aren't attacked by the malware writers. ## Find Alternatives A lot of popular software has alternatives. For example, I use a (somewhat hard to install) app called GAIM instead of AIM and Yahoo Messenger. Thus, I can delete both AIM and Yahoo Messenger, which both take up a lot of space, and also slow down the computer more than GAIM. By using simpler alternatives, which use less CPU and RAM (and are usually free), you can speed up your overall computer speed.* Here are some alternatives: Yahoo Messenger, AIM = GAIM or Trillian Outlook Express = Mozilla Thunderbird or Sylpheed Windows Media, Quicktime, Real Player = Video LAN Client (sometimes) iTunes = WinAmp Free (the smallest version) MS Office = MS Works (which usually comes free with computers, costs$10 on eBay)
Photoshop = The GIMP

* A reason why this speeds up the computer is because you avoid using up all your random access memory (RAM), which is on a chip, and avoid causing the computer to use "virtual memory" (VM), which is on the disk.

Winson's Place: another good article

# How to Stay Virus-Free and Speedy with Windows XP, Part 2

## Recovering From An Infected System

I didn't realize how lucky I was to have avoided viruses. A system came to me with a virus that prevented users from typing in the access information to AOL's virus system, and seemed to also hide from some virus scanners. The solution is to use a "boot CD" to start up the system from the CD-ROM, and then run tools to clean off the hard disk.

Boot CDs started out on Linux, where it was not entirely unusual to set up machines to boot up (start up) into different operating systems, or even different configurations of the same operating system. The next logical step was to put the entire operating system onto the CD. This idea led to the creation of Windows Boot CDs.

The one I'm using currently is The Ultimate Boot CD for Windows, which is based on Bart's PE, a boot CD system. It comes preinstalled with all the free command-line virus scanners.

### F8

After one run through with the boot CD, I did a session using "F8". When you reboot into XP, start hitting the F8 key to get the menu to start Windows in "Safe Mode". Safe mode starts up Windows, but doesn't start up most of the drivers or services, thus preventing viruses from starting.

Boot into safe mode with networking, and then go to the virus scanning sites (listed above). They'll find any stray viruses. You can then remove the files manually. Easier said than done, though... Viruses know how to hide, and anti-virus tool vendors don't want to make it too easy to clean yourself.

The first tool in your arsenal is the "Search..." program from the Start Menu. Type in the filename and see if it comes up. If it does, delete the file.

If it doesn't, the virus is located in some hidden directory. That means you have to use the Command Line, cmd.exe. McAfee displays the first directory, so you can usually CD into that directory. Then, you can do a "DIR /A" to display hidden files. Using a little cut and paste, you can build the correct path for Search.

For example, one virus was detected in C:\System Volume Information\_restore{987E0331-0F01-427C-A58A-7A2E4AABF84D}. I had to dig around to build that path, but once it was in Search, it found the offending file, and it was deleted.

## Cleaning Up the Disk

I believe that keeping the disk clean is of dubious value, unless the system is very old. Most slowdowns are due to applications and small programs executing, consuming memory. This causes RAM to run out, and forces the system to swap to disk (that is, it saves out part of RAM to disk, and then loads up data from disk into RAM).

That said, there are some disk tools that, at the very least, look useful. They are located in Start Menu -> All Programs -> Accessories -> System Tools. Disk Cleanup compresses old files, and deletes temporary files. Defragment Disk rearranges the blocks on the disk so that file access will be a little faster. If you're going to use them, run the cleanup first, then defragment.

Before you defragment, you may want to twiddle the virtual memory (VM) settings a little bit. Turn it down to a small size, or use no paging file if you have enough RAM. Then, defrag the disk. Then, boost the VM back to its prior size or larger. This will cause the VM page file (the file where VM is stored) to be a large, contiguous block. VM access will improve.

I've noticed that some people have slow disks, and that can kill performance upgrades. If you get a significant motherboard upgrade, it's a good idea to get a new disk that will run as fast as the built-in IDE controllers on the motherboard. If the system has PCI-X slots, get a 3.0 Mb/s SATA card and a SATA drive. This will improve booting and program loading times. Additionally, get enough RAM so you don't swap to disk. VM isn't supposed to be something you use regularly. It's there for emergencies, when you really need just a little extra space.

# How to Update URLs in a MySQL Database after Moving a Site with WGET

Sometimes you need to move your old website off of a CMS, or at least archive it, and the only way is to use WGET to mirror the website. Wget downloads entire websites, turning dynamic sites into static sites. The following command would download the site http://www.theoldsite.net/mypath/index.html:

wget -H -Dimghost.com,theoldsite.net '--restrict-file-name=windows' -A gif,jpg,html,tcl -np --convert-links --html-extension -rx http://www.theoldsite.net/mypath/index.html

That would download the gif, jpg, html, and tcl files, from both imghost.com and theoldsite.net, and make the URLs into Windows-compatible file names ending in ".html", and converting links into relative links, so the output folder could be moved.

That's all fine if you just want to link out to the site, but if you link to specific pages within the site, you now have to fix all the URLs. If you're using a database, and these URLs are in a table, it's not difficult to fix:

update stories set link=replace(link,'http://www.theoldsite.net/mypath/','/old-site/www.theoldsite.net/mypath/') WHERE tags like '%relevant%';



Don't run those commands as-is. You have to alter them to work with your URLs. The basic idea is to replace the left part of the URL with your new site's URL, and then replace weird characters within the URL, and append '.html' to the URL.

# Image with Transparent Caption

Here's some HTML and CSS to make an image with a transparent caption that displays over the image.

<style type="text/css">
.caption-background {
width: 500px;
background-color: black;
opacity: 0.7;
margin-top: -80px;
color: white;
}
.caption {
vertical-align: bottom;
font-family: Helvetica,Arial;
margin: 0px 10px 0px 10px;
}
.caption H1 {
margin: 0px;
font-weight: normal;
}
.caption P {
margin: 0px 0px 15px 0px;
}
</style>
<img src="evergreen-soshiki.jpg" width="500" height="257" />
<div class="caption-background">
<div class="caption">
<h1>Caption caption caption caption
<p>By Author Name | Date | N Comments
</div>
</div>


# InDesign: Black Prints as Gray

InDesign Help had an article about this problem, where you think you're printing black, but it's coming out of the printer as gray.

Even worse, the blacks in the images come out black, making your gray look ugly!

Solution is to go to Edit->Preferences->Appearance of Black

Set On Screen to "Display all Blacks Accurately"

Set Printing to: "Output all Blacks as Rich Black".

Rich Black is a black produced by combining CMY and K into black. I think the normal thing to that happens is, black gets replaced with 100% K ink (black ink), which is actually a little bit gray when viewed next to Rich Black.

(Just to confirm, I looked at some offset-printed pages on glossy paper. Indeed, black ink looks lighter than black ink combined with another ink! You can see this by comparing a black graphic with a color graphic overlaid with some black text. The black text doesn't knock out the color ink -- the software is probably trying to avoid registration problems that would show up as white edges on the letters.)

Once you do that, InDesign seems to convert all blacks to have 100% K. But if you manipulate the color, you'll have to adjust the color. That's what worked for me.

# Indenting Styles for C-Style Code

People get into all kinds of gripey little snits about how to indent code. Whenever you start a project, it's pretty important to nail down indentation, because it's one of those personal preferences that becomes "a big issue" when there's a conflict. Usually, the indentation is a non-issue, but it's something to fight about instead of discussing the real underlying issues, like interpersonal communication problesm.

So, let's catalog some styles, and discuss:
 if (a==b) { do_this(); } 
That's the standard Java style. It's pretty compact.
 if (a==b) { do_this(); } 
That's the standard C style, and it puts a little extra whitespace in there. It's my favorite style, because it is the easiest to read.
 if (a==b) { do_this(); } 
I think I saw that in Code Complete, a very good book about programming style, by Charles McConnell. I recommend the book, but not this indentation style. it's nice that the braces are aligned with the code... but it leaves the "if" way out there.
 if (a==b) { do_this(); } 
Hmmmm... That's a variation of the previous one. I don't like it. It's not irrational, but, still.
 if (a==b) { do_this(); } else { do_that(); } 
This is my preferred style. I hate the way the else sits there, but only when I'm typing it. When I come back to read the code, it seems nice and airy. It also works well with "else if (...)"
 switch ($a) { case 'a': do_this(); break; default: do_that(); break; }  This is a different statement. I didn't like putting the break to the left edge at first, but now like it, because it highlights that the program won't continue into the next block.  switch ($a) { case 'a': do_this(); break; default: do_that(); break; } 
That was my old style. I like how the case statements stand out, but, now, it's hard to just allow the code from one block to continue into a different block. You could do it, but it'd be hard to notice, and could lead to some nasty bugs.
 switch (a) { case 'a': do_this(); break; default: do_that(); break; }  This is like the spaced-out style I like, but I'm still not used to dropping the bracket onto the next line. Maybe it'll make sense, eventually.  a = 1; cat = 2; dog = 100;  I saw this in some Visual Basic code. It looked cool, but adding more text to the block looked tedious.  a = 1; cat = 2; dog = 100;  That's my style. Lazy.  a = 1; cat = 2; hotdog = 100;  A lot of people are into aligning the equals sign. It seems like a lot of work to me, especially if you have very_long_variable_names. Every language has its own subtle rules, because every language has unusual features that may or may not translate well to the screen. For example, in Perl:  map { code...; } @array;  I like to use that, but if I were going to be more uptight:  map { code...; } @array;  That's not quite right, in my opinion. The code block isn't really just a code block -- it's passed as an argument to map. Perl lets you pass functions as arguments, and the map command will apply the function to the array. The function is defined in-line. Another form of map is written like this:  map(funcname, @array);  Another way to write it is to use the block again:  map { funcname(_) } @array; 
Alternate uses influence the most wordy, indented style. You don't want the "big" style to be that different from the "small" style. You want them to look the same, if they are similar.

# Installing R Packages Globally (for rApache)

In Ubuntu Linux, the path to the global libraries is: /usr/local/lib/R/site-library/

To install there, you can do install.packages(c('foo'), '/usr/local/lib/R/site-library/')

or take advantage of the built in variable: install.packages(c('foo'), .Library.site[1])

Check that .Library.site has the values you need.

You can also use R CMD INSTALL -l /path/to/library foo

(It didn't work for me... :( )

Below is a story about installing globally from source:

I was trying to run some Rook code in rApache, and discovered (via RApacheInfo, r-info) that the package wasn't attached. Not being that familiar with either, I figured I needed to install a package globally.

The right way is described at stackoverflow by Dirk Eddelbuettel. littler is a scripting front end for R, so you can write R scripts as if they are regular scripts. (Normally, you need to go through the trouble of using here files.)

Install littler

apt-get install littler

I copied the example scripts into my local bin

cp /usr/share/doc/littler/examples/* ~/bin

Then installed Rook

sudo ~/bin/install.r Rook

Restarted Apache

sudo service apache2 restart

Then, went back to the RApacheInfo page to look at the libraries. Rook was there! Yay!

But going back to the URL with the Rook script failed.

Tailing the server logs says rCharts isn't installed.

So I then tried to install rCharts.

Didn't work!

sudo -s
cd /usr/local/lib/R/site-library
wget https://github.com/ramnathv/rCharts/archive/master.tar.gz
R CMD INSTALL -l . master.tar.gz
service apache2 restart
# and then when it works
rm master.tar.gz


Turned out I needed more packages installed. Run these as root (or as recommended in the link, as a member of the staff group):

~/bin/install.r plyr
~/bin/install.r RJSONIO
~/bin/install.r whisker
~/bin/install.r yaml
~/bin/install.r zoo
~/bin/install.r DBI
# the following might require the libmysqlclient-dev package
~/bin/install.r RMySQL
# the next one doesn't work for R 3.0
~/bin/install.r devtools


Note: I haven't cleaned up my script and some of those libraries are extraneous... sorry.

All this stuff isn't automated, so you should paste it into a script. You'll need to run the update.r script later to update your packages.

Once that was done, the script could run a Hello, world program.

Getting the database going was a whole other task.

## The rApache Config Lines

These follow the tutorial at the rApache site.

  <Location /RApacheInfo>
SetHandler r-info
</Location>
<Location /RToeChart>
SetHandler r-handler
RFileEval /home/johnk/Dropbox/www/foobar/firstplotrapache.R:Rook::Server$call(app) </Location>  ## The MySQL cnf file There are several ways to pass password info to the application, but the way I like is MySQL options files, aka the my.cnf file. In Debian systems, they are in the files /etc/mysql/conf.d/*.cnf. Become root. Create a file called foobar.cnf: [rstudio] user = user password = ***** host = localhost port = 3306 protocol = TCP database = foobar [rs-dbi] database = foobar  Then you have to set the file owner and mode: chown www-data /etc/mysql/conf.d/foobar.cnf chmod go-rw /etc/mysql/conf.d/foobar.cnf That's my setup. I don't think the rs-dbi section is required, but I have it there as a fallback. # Intel Motherboard Computer Crashes Without BSOD We got these new computers at work, and for some reason, mine was crashing. Being that I made the computer selection, I chose Intel mainboard systems. Sysadmins like Intel, but they are nearly invisible in the marketplace, and not favored by either gamer screwdriver shops or mass manufacturers. They sell some mobos to the mass makers, but you also see other brands like Asus, ECS, and MSI in a lot of boxes. So I went with Bytespeed, a small screwdriver shop servicing school systems, that only uses Intel motherboards. The problem was, the computer crashed, and in an unusual way. The screen would get "noise" that looked like an old TV not latching onto a signal. It would crash, and there was never a BSOD or a crash log. (That's the price you pay for getting a "fast" computer rather than a midrange one - instability.) Typically, I like to diagnose the issue rather than get immediate warranty service. For one, by waiting it out, you improve your odds of getting a more debugged product. Send it back immediately, and you're still pulling from a potentially faulty batch of parts. Aside from that, it's entirely possible that it's not the hardware. So I worked slowly to diagnose. (Also, the company's 5-year warranty gives you a lot of leeway.) The first things tried were easy - replace the keyboard and mouse. Maybe they were flaky. That didn't fix it. Next, I disconnected my cell phone from the USB. Again, not working. Finally, the computer was moved, and another computer brought in, and I used Remote Desktop to use the computer. The system went super-stable. It never crashed, and never disconnected. The terminal computer also never failed (another new Bytespeed). My theory shifted: it could be the monitor. The monitor was an old IBM CRT with very good color. Lots of range. It's also from the late 1990s. Being a decent monitor, it had Plug-and-Play. There's a signal that told the computer it was an E94, and the optimal resolutions. So, after a couple weeks of Remote Desktop, the computers were re-arranged again, and an older Dell monitor attached to the computer. The system remained stable. To confirm that it's the monitor that's causing trouble, I'll have to reconnect it at some point, and see if it causes crashes. A side note - during the computer setup, I had a lot of problems with older USB devices plugged into the USB 3 ports. They caused crashes. So it's possible there was something in the USB 3 ports before causing the crash. However, given that these USB crashes generally resulted in the keyboard or mouse freezing up, I don't think it's the USB ports causing the specific crash I was having before. (Now I understand why computer vendors sell computer systems with peripherals - less trouble.) Through all this, Bytespeed has been good. They're always contacting me about the status of this computer. They have competent tech support - around as good as the Dell business-class tech support (which is really good imho). This computer problem, if you consider it, could not have been solved by regular tech support. The problem, I'm assuming, was this old monitor, which nobody is going to have. What it took was a technical person experimenting to discover the problems and quirks of the hardware. Also, I don't consider switching to another mobo brand, except maybe Asus, to be an option. I've had too much trouble with the other brands. Support is generally nonexistent after the first couple years - the churn of boards and features is impressive, but scary too. They use less 'leet parts. With Intel, you pay more, but get what are considered better parts, and another year or so of driver updates. # Javascript Calculator: Split Up Your Reciepts Here's a Javascript calculator that was put together to deal with situations where you have to split up a grocery receipt with a friend. You can type in the prices, one per line. Check the box if it's a taxable item. (Set the tax rate if it's not 8.25%.) Then, click the "+" button to add it up. Tax rate: % # Josh's 3-Column Layout in CSS Josh Haglund came up with an awesome way to do a 3-column layout in CSS. Let's suppose you have three DIVs, arranged into three columns with the float:left and float:right styles. (Chances are, if you're reading this, you know what this is. If not, Google some other pages, and see what others do.) The common problem (aside from learning to use floats) is that the columns aren't all the same height. The quick solution is to create a background image that looks like the 3-column layout. If it's a simple layout, then you should be able to use a 1-pixel tall, very wide line, repeated several hundred times, to create the columns. Put that skinny gif into a DIV via a background-image:url(skinny.gif). Then, within this DIV, you have the layout. Whereever the column is short, the background image displays, making it look like the column extends to the bottom of the layout. For best results, make the layout first, then create the background image. # Kindle Tricks (Linux) An Amazon press release said that they sold more Kindle books than paper books. That might be true, but, they probably included the thousands of books being sold for free, or a few dollars. There are numerous public domain books "for sale" on Kindle. I downloaded several dozen. Here are a few Amazon Kindle tricks. If you take the clear plastic protector sheet that's stuck on the front, and stick it to the back, the cold metal back of the Kindle won't touch your fingers, and your hand will stay warmer. The amazon tag "kindle freebies" will bring up all the free books you can get. Be careful, because a lot of free books are being sold for$1 to $3. The site feedbooks.com will also send you free books, but Amazon will charge 15 cents per book for the download (unless you copy it via USB). Mobigen.exe works in WINE. The linux_mobigen is no longer on the mobi site, it seems, and the random one floating around requires libstdc++.so.5, which doesn't come with my distro of Ubuntu (I think the lib is an older version). A couple aliases that could help: alias kindle="sudo eject -t /dev/sdb" alias mobigen="wine ~/bin/mobigen.exe -verbose"  The kindle command will cause the device to be mounted. That way, you don't have to keep unplugging the USB cable to get the Kindle to show up as a disk. The mobigen command just runs mobigen from your bin directory. mobigen will convert plain HTML files into .mobi files, which can be read on Kindle. That's the good news. The bad news is that most web pages have a lot of Javascript on them, so you need to view the printer-friendly version of the page, and convert that instead. The verbose option seems to help it make the files. If you keep getting the "can't make temporary file" error, try this: First, run "wine cmd.exe". That gives you a DOS style shell. Then type "bin/mobigen.exe File-to-convert.html", within the DOS shell. That works for me. You just don't get to use all the file-name completion features. An interactive ebook authoring tool is eCub by Julian Smart, who made wxWidgets. (Haven't tried it yet.) It uses the mobigen tool to generate the .mobi file. This .mobi file kind-of sucks. It's a proprietary binary format without an open source implementation. It would be nice if Kindle had support for the .epub format. It would make it a little easier to do things like convert web pages into books, and copy them onto the reader. I guess Amazon is using the iTunes model here. The simple-to-use pathways are all proprietary and have DRM, and making it easy to load other content onto the reader, while, possible, is not a priority. This may help authors and Amazon make some money now, but it could harm the utility of the Kindle in the future, because competing readers have .epub support. People report that Gmail works well with Kindle. It's kind of clunky. # LPIC-1 Examp Self-Cram Notes I was looking around and stumbled across an article about the LPI exams, which are generally considered the best of the many certs out there. That's to say, they are the toughest. It turns out there are a bunch of people selling old tests. LPI also has a cram course, and are going to be proctoring tests at SCALE, at a discount. I'm not sure I can handle LPIC, but this article is an attempt at self-assessment, and can be used as a study guide. I'm going to copy the content from the following page, and use it as an outline to flesh out: http://www.lpi.org/linux-certifications/programs/lpic-1/exam-101/ The layout on this drupal install is screwed up, so you can see the original text here at LPIC-1 Exam Cram. ## System Architecture ### Determine and configure hardware settings • Weight: 2 • Description: Candidates should be able to determine and configure fundamental system hardware. #### Key Knowledge Areas • Enable and disable integrated peripherals. • This is generally done through the BIOS settings which are accessed by pressing DEL F10 or F12 during boot. There's usually a screen titled "Integrated Peripherals" where these can be disabled. • Configure systems with or without external peripherals such as keyboards. • Again, should be disabled via BIOS. Some older BIOSs have an option where the system can halt on keyboard errors - make sure that isn't set. • Differentiate between the various types of mass storage devices. • SATA and IDE spinning disks and solid state disks (SSDs) show up as SCSI devices under /dev/sd?*. • On older systems, these show up as /dev/hd?*. • Firewire connected devices usually show up as /dev/sd?* • USB-connected disks generally show up as /dev/sd?( • USB flash memory sometimes shows up as two disks, because they are partitioned to have a small partition with Windows software to grant access to the second partition. • Older style RAID arrays appear as /dev/sd* or /dev/sc*. They are generally managed by a boot-time BIOS on the card. • CD and DVD RW disks show up under /dev/sr?* (scsi removable), and are also symlinked from well-known links /dev/cdrom /dev/dvd. • Network mounted storage can be available via SMB or NFS. • Set the correct hardware ID for different devices, especially the boot device. • I don't understand what this is. It could mean the GUID values that are written to partitions. It could mean the SCSI IDs. It could mean the IDs that Grub uses to boot the system. Needs further investigation. • Know the differences between coldplug and hotplug devices. • Coldplug devices are generally cold plug because of physical limitations, like PCI and IDE cannot be hot-plugged. PS/2 ports cannot be hotplugged. • USB, Firewire, SATA and eSATA devices can be hot-plugged. • All devices are detected by udev. In old Linux system hald (hardware abstraction layer) would trigger behaviors. Now it looks like udev has been made more capable of triggering programs. udev manages the /dev directory. • Determine hardware resources for devices. • I'm not sure what this means. It may mean knowing about the layouts of /sys and /proc, and how to drill down into the buses and devices to discover device IDs and memory mappings (for DMA devices). • Tools and utilities to list various hardware information (e.g. lsusb, lspci, etc.) • lsusb - lists the usb bus • modprobe and lsmod - lists the active drivers, and can load a driver • lspci - lists the devices on the pci bus • lsblk - lists all block devices, similar to mount • lspcmcia - lists pc bus devices • lscpu - describes the cpu • lshw - lists everything • Tools and utilities to manipulate USB devices • lsusb - lists usb devices • usb-devices - a report that's way more detailed than lsusb • I don't know enough about this topic. • Conceptual understanding of sysfs, udev, hald, dbus • udev manages the /dev directory • sysfs - do they mean /sys ? Or do they mean the thing that lists the available file system types? • udev can trigger actions, among them talking to hald • hald is a deprecated layer that manages hot-plug and cold-plug devices. Prior to hald, there were only cold-plug possibilities. hald integrates with the dbus. • dbus is a pubsub messaging bus from freedesktop.org, providing a dekstop-environment neutral way to pass messages between programs. hald and udev events can send messages to the dbus. hotplug events can thus be relayed to the desktop, which can then mount devices or load drivers or run programs. #### Terms and Utilities • /sys • /proc • /dev • modprobe • lsmod • lspci • lsusb # Laptops are a Virus Risk: How to Email Safely It's been seven years since the "I LOVE YOU" email virus of 2000, but these email viruses still manage to infect people. More importantly, email-based trojans are still being used to launch more complex, and subtle attacks. (See Timeline of notable computer virues and worms.) A contemporary high-risk scenario involves laptops that leave the office, and become home computers in the evening. Office networks generally have some form of malware detection and quarantine. More sophisticated sites run centralized file scanning and email scanning, combined with restricted user access, to reduce the impact of malware. So, within the office network, when a recognized virus appears, it's contained, and doesn't have the opportunity to destroy the network. Outside of the office, though, tight security is a lot less common. Computers connected to the internet are attacked, relentlessly, by armies of "zombied" computers. Email malware floods into mailboxes. ## Avoiding the Plague One way to avoid the risk of the plague of malware is to modify your computer use so that it's a less inviting target. The following techniques will reduce your risk. ### Environment If you use Outlook for work, don't use it for your personal email. The Outlook and Outlook Express email clients are the most popular targets for virus-writers. They know that everyone gets a free copy of either one (or both) with their new computer. They also know it's hard to disable Outlook Express. By using one of the less popular email applications, you deprive the viruses of the "environment" to spread. Some popular clients are Thunderbird, Sylpheed, Pegasus, and Eudora. ### Turn on Anti-Virus at the ISP If you're using your ISP-provided email address, you should find out if they offer anti-virus scanning. If so, you should turn that feature on. If they charge for it, you should consider paying, or switching to another email service. ### Use Webmail The big webmail sites do virus scanning. Hotmail, Yahoo, and Gmail can scan your messages for viruses. These aren't totally risk-free, but they are safer than nothing. # Loop Faster nzakas has a great presentation about speeding up Javascript loops but it applies to any language that uses C-like loop structures. The first principle is not to call a function in the comparison, if the compared value doesn't change. (This is pseudocode by the way.) Bad: for( i=0; i < a.length(); i++) Better: len = a.length(); for( i=0; i < len; i++ ) The second principal is to count down rather than up. This is better: l = a.length() - 1; for( ; l >= 0; l-- ) The next optimization should be obvious: l = a.length(); for( ; l-- ; ) And since we're not initializing or testing: l = a.length(); while( l-- ) The speedup in interpreted languages is huge, but even in compiled languages, there are speedups because there's typically a "not equal to zero" instruction, or something that can leverage a comparison to zero. Additionally, this code is easier to debug once you understand the idiom. # MS Access VBA: Error -2147217900 (80040e14) Jawahar on Expertsforge says this is an SQL syntax error where a keyword is used as a field name. In Access, the app finds these keywords and quotes them before running the query. It's all done behind the scenes, but you can expose this feature through the query design tool. Create a new query in design view. Bring up the SQL view. Paste your SQL in there. (You are probably already be at this point, testing your SQL and knowing it works.) Go to the Design view again. Then, go to the SQL view. Access should have added some parentheses and square brackets. The square brackets are used to quote keywords. You can then fix your code by quoting your keywords. (Use the backtick () instead of square brackets to be more normal.) # MS Access, Outlook: recording bounced email addresses This is a subroutine that will scan your Outlook inbox or a subfolder of inbox named "Bounces", and copy bounced email addresses to a MS Access database. It will then join the table of bad addresses to another table (of people, presumably) and null out the bad addresses, so you won't send to them again. This code is pretty jacked up, but, it works for my specific configuration, which is Outlook as the client, Exchange as the server. Many addresses won't be detected, because Exchange removes the internet email address, substituting the user's real-world name instead. For those, you'll have to manually remove the addresses. (The problem here is "indirection". Outlook and Exchange try to hide the ugly internet email addresses, and use a more complex system that allows you to use the user's real name, and have it resolve to a record in a directory. That record contains the real address, whether it's an X.400, internet, or Exchange address. The problem with this is roughly the same problem people have with phones, when they use speed dial or memory dial all the time -- they forget the underlying phone number. In this situation, with the email address, it's the server deliberately losing the underlying email address.)  Public Sub CopyBouncedAddressesToDatabase() Dim conn As New ADODB.Connection Dim cmd As New ADODB.Command Dim rs As New ADODB.Recordset Dim AccessConnect As String AccessConnect = "Driver={Microsoft Access Driver (*.mdb)};" & _ "Dbq=DATABASE.mdb;" & _ "DefaultDir=C:\DATABASE;" & _ "Uid=Admin;Pwd=;" conn.Open AccessConnect Dim inbox, bounces As Outlook.MAPIFolder Dim mail As Variant Dim body As String Dim lines As Variant Dim address As Variant Dim addressarray As Variant Set inbox = Outlook.Application.GetNamespace("MAPI").GetDefaultFolder(olFolderInbox) On Error GoTo NoBounces Set bounces = inbox.Folders.item("Bounces") On Error GoTo 0 ct = bounces.Items.Count For i = ct To 1 Step -1 Set mail = bounces.Items(i) lines = Split(mail.body, vbCrLf, 50) If UBound(lines) > 7 Then If lines(1) = "I'm afraid I wasn't able to deliver your message to the following addresses." _ And InStr(lines(4), "@") Then ' matches qmail bounces address = Mid(lines(4), 2) address = Left(address, Len(address) - 2) conn.Execute "INSERT INTO tmpBouncingEmails (email) VALUES ('" & address & "')" mail.Delete ElseIf lines(0) = "Your message did not reach some or all of the intended recipients." _ And InStr(lines(7), "@") Then ' matches exchange bounces address = LTrim(lines(7)) addressarray = Split(address) address = addressarray(0) address = Replace(address, "'", "") conn.Execute "INSERT INTO tmpBouncingEmails (email) VALUES ('" & address & "')" mail.Delete ElseIf lines(0) = "Your message did not reach some or all of the intended recipients." _ And (InStr(lines(9), "unknown user account>") _ Or InStr(lines(9), "User unknown>") _ Or InStr(lines(9), "No such user") _ Or InStr(lines(9), "Address rejected") _ Or InStr(lines(9), "Invalid recipient") _ Or InStr(lines(9), "User account is unavailable") _ Or InStr(lines(9), "Addressee unknown") _ Or InStr(lines(9), "Unable to deliver to") _ Or InStr(lines(9), "smtp;550") _ ) _ Then ' matches exchange bounces address = LTrim(lines(9)) addressarray = Split(address) offs = 1 For offs = 1 To UBound(addressarray) If InStr(addressarray(offs), "@") Then Exit For Next If offs <= UBound(addressarray) Then address = addressarray(offs) address = Replace(address, "...User", "") address = Replace(address, "'", "") address = Replace(address, "<", "") address = Replace(address, ">:", "") address = Replace(address, ">...", "") address = Replace(address, ">", "") address = Replace(address, "(", "") address = Replace(address, ")", "") conn.Execute "INSERT INTO tmpBouncingEmails (email) VALUES ('" & address & "')" mail.Delete End If ElseIf lines(1) = "Unable to deliver message to the following address(es)." _ And InStr(lines(4), "@") Then ' matches first bounce in a yahoo.com bounce address = LTrim(lines(4)) addressarray = Split(address) address = addressarray(7) address = Replace(address, "(", "") address = Replace(address, ")", "") conn.Execute "INSERT INTO tmpBouncingEmails (email) VALUES ('" & address & "')" mail.Delete ElseIf lines(0) = "Your message did not reach some or all of the intended recipients." _ And (InStr(lines(9), "User account is overquota") Or _ InStr(lines(10), "User account is overquota")) Then ' just ignore this message - account is good mail.Delete ElseIf lines(0) = "Your message did not reach some or all of the intended recipients." Then ' at this point, we don't have an address for them ' so we'll just log their outlook contact name or something ' fixme End If End If ' lines.count > 7 Next ' null out the bouncing email addresses conn.Execute "UPDATE tmpBouncingEmails INNER JOIN tblPeople ON tblPeople.email = tblPeople.Email SET tblPeople.Email = Null" ' clear out the temporary table conn.Execute "DELETE * FROM tmpBouncingEmails" conn.Close Exit Sub ' called if the bounces folder does not exist NoBounces: Set bounces = inbox Resume Next End Sub  # MS Access: Address Cleanup Macros Here are some Excel macros that help to clean up data. Once cleaned, it's easier to remove duplicates. (I used these to de-dupe a list exported from Outlook.) Included is a rough version of MS Access' Nz() function.  Public Sub SimplifyEmails() ' This subroutine scans a column, turning emails in this form: ' Joe Blow (joe@company.com) ' Into this form: ' joe@company.com Dim Rng As Range Set Rng = Application.Intersect(ActiveSheet.UsedRange, _ ActiveSheet.Columns(ActiveCell.Column)) Col = Rng.Column N = 0 For R = Rng.Rows.Count To 2 Step -1 V = ActiveSheet.Cells(R, Col).Value ' Debug.Print V If V <> Empty Then If Nz(InStr(V, "(")) < Nz(InStr(V, ")")) _ And Nz(InStr(V, "(")) > 0 Then Start = InStr(V, "(") + 1 Length = InStr(V, ")") - Start newmail = "'" & Mid(V, Start, Length) Debug.Print newmail ActiveSheet.Cells(R, Col).Value = newmail End If End If Next R End Sub Function Nz(a As Variant) As Variant If IsNull(a) Then Select Case a.Type Case xlNumber Nz = 0 Case Else Nz = "" End Select Else Nz = a End If End Function Public Sub NormalizePhones() Dim Rng As Range Set Rng = Application.Intersect(ActiveSheet.UsedRange, _ ActiveSheet.Columns(ActiveCell.Column)) Col = Rng.Column N = 0 For R = Rng.Rows.Count To 2 Step -1 V = ActiveSheet.Cells(R, Col).Value ' Debug.Print V If V <> Empty Then ' first replace . with - V = Replace(V, ".", "-") ' second if there's a dash in position 4, then turn it into parens If InStr(V, "-") = 4 Then V = "(" & Mid(V, 1, 3) & ") " & Mid(V, 5) End If ' third strip any double spaces (replace with single space) V = Replace(V, " ", " ") ' fourth if there's a space in position 4, then turn it into parens If InStr(V, " ") = 4 Then V = "(" & Mid(V, 1, 3) & ") " & Mid(V, 5) End If ActiveSheet.Cells(R, Col).Value = V End If Next R End Sub Public Sub TrimAllCells() ' removes leading and trailing spaces, and replaces double-spaces with single spaces Dim Rng As Range Set Rng = Application.Intersect(ActiveSheet.UsedRange, _ ActiveSheet.Columns(ActiveCell.Column)) Col = Rng.Columns.Count N = 0 For R = Rng.Rows.Count To 2 Step -1 For C = Col To 2 Step -1 V = ActiveSheet.Cells(R, C).Value If V <> Empty Then ' strip any double spaces (replace with single space) V = Replace(V, " ", " ") ' ltrim and rtrim the data V = LTrim(V) V = RTrim(V) ActiveSheet.Cells(R, C).Value = V End If Next C Next R End Sub  # MS Access: Application Configuration Settings in Tables This is a relational way to store application configuration in a table. It uses two tables, so you can store multiple configurations, so that you can use the tool over and over, and still retain the old settings. One table stores configurations, and one stores a since row with the current configuration in use. Setting values are retrieved from the configuration tables with queries like this: (SELECT PreRegActivityID FROM Congress7_Config WHERE ID=(SELECT CurrentConfigID FROM Congress7_CurrentConfig)) AttachmentSize AccessAppConfiguration.jpg44.24 KB # MS Access: Automatically Jumping to the Only Record that Matches Many years back, just before web pages got popular, I remember that some programs sent you as close as possible to your desired data whenever you searched. If you typed a search term, and only one record matched, you'd be taken to that record. I have been using an Access db at work that doesn't have this feature. It's kind of a pain, because when you search, you sometimes get results that are one record, or no records at all. Below is code that will take you straight to the record if you type in a search term that's specific enough. There's no magic shortcut here. You have to "peek" into the results to count the number of records your search will bring up, and behave accordingly. There's also some logic to distinguish between searches for full names and last names. It's another way to refine the search quickly. (BTW, you can't just drop this code into your project. You have to study it and replicate the logic for your own system. Sorry, lazy programmers.) Here's some code to do that: Private Sub ActFilter_AfterUpdate() On Error GoTo Err_ActFilter_Click Dim stDocName As String Dim stLinkCriteria As String Dim f As String Dim first, last As String Dim offset As Long Dim dbs As Database Dim rst As Recordset Dim fedid As Variant Set dbs = CurrentDb ' if they type both first and last name, try to match on both f = LTrim(RTrim([ActFilter])) offset = InStr(1, f, " ") If (offset > 0) Then first = Left(f, offset - 1) last = Mid(f, offset + 1) stLinkCriteria = "[FName] Like " & SQuote(first & "*") & _ " AND [LName] Like " & SQuote(last & "*") Else stLinkCriteria = "[LName] Like " & SQuote(f & "*") & _ " OR Email Like " & SQuote(f & "*") End If ' peek into db to see if records exist Set rst = dbs.OpenRecordset("SELECT FEDID FROM tblActivists WHERE " & stLinkCriteria) ' if no records exist, don't show results If rst.EOF Then MsgBox "Nobody matches." rst.Close Exit Sub End If ' count how many results there are. if only 1, then jump to the record rst.MoveLast If (rst.RecordCount = 1) Then fedid = rst.Fields("FEDID") rst.Close ActFilter = "" DoCmd.OpenForm "frmActivists", , , "[FEDID] = " & fedid Exit Sub End If rst.Close ' if we have more than one record, show a list of records stDocName = "frmActivList" ActFilter = "" DoCmd.OpenForm stDocName, , , stLinkCriteria Exit_ActFilter_Click: Exit Sub Err_ActFilter_Click: MsgBox Err.Description Resume Exit_ActFilter_Click End Sub  # MS Access: Comparing Queries Between Two Databases (a query diff) Often, when you have MS Access in a small office, and have done the right thing and split the database into a backend of tables and frontend of queries, reports, and forms, you end up with changes to the objects in multiple files. The trickiest is comparing queries, because the query object is modified if even a column width is changed. You need to dig deeper and compare queries. This code below compares the local queries to queries in another database. In order to use it, you need to link the remote MSysObjects table. Call it MSysObjects-REMOTE-mdb. That's because we get lists of queries by dumping them from the hidden MSysObjects table rather than via the APIs. This way, we get all the queries. You also need to create a table tblMultiMDBQueryComparison with the following fields: DBName text, ObjName text, ModDate datetime. We dump the query object info into this table first, then generate a temporary report from it. Normally, I wouldn't post code that, imnsho, is so crappy, but there were a number of people online requesting a tool that does this, or something similar, like comparing object modification dates. Part of the reason it's so screwed up looking is that it uses both DAO and ADO. It's cut-and-pasted from the www and my past work. What's interesting is that DAO will always return the SQL for a query, but ADO will not. ADO doesn't return queries (called commands) when the underlying SQL contains a bug. "This isn't a bug, it's a feature." You could hack this to point the "remote" db back to the local db, and find all the buggy queries. Sub DiffQueries() ' http://support.microsoft.com/kb/168336 ' http://www.everythingaccess.com/tutorials.asp?ID=ADOX-programming-examples ' http://msdn.microsoft.com/en-us/library/aa164887%28v=office.10%29.aspx ' http://msdn.microsoft.com/en-us/library/windows/desktop/ms678060%28v=vs.... ' http://oreilly.com/catalog/progacdao/chapter/ch08.html ' http://www.vb-helper.com/howto_adox_list_query_text.html Dim db As DAO.Database Dim rst As DAO.Recordset Dim qdf As DAO.QueryDef Dim q As DAO.QueryDef Dim cn As ADODB.Connection Dim rstNames As ADODB.Recordset Dim localdb As ADODB.Connection Dim remote As ADODB.Connection Dim cat As ADOX.Catalog Dim v As ADOX.View Dim cmd As ADODB.Command ' Use this as a model for dumping objects into the table. s = "INSERT INTO tblMultiMDBQueryComparison ( DBName, ObjName, ModDate ) " & _ "SELECT 'LOCAL' AS DBName, MSysObjects.Name AS ObjName, MSysObjects.DateUpdate " & _ "FROM MSysObjects WHERE ((MSysObjects.Type)=5) " Set db = CurrentDb ' Load the local objects db.Execute ("DELETE FROM tblMultiMDBQueryComparison") db.Execute s s = "INSERT INTO tblMultiMDBQueryComparison ( DBName, ObjName, ModDate ) " & _ "SELECT 'mdb' AS DBName, MSysObjects.Name AS ObjName, MSysObjects.DateUpdate " & _ "FROM MSysObjects-REMOTE-mdb as MSysObjects WHERE ((MSysObjects.Type)=5)" db.Execute s db.Execute "DELETE FROM tblMultiMDBQueryComparison WHERE ObjName LIKE '~*'" ' Create a table of object names. On Error Resume Next db.Execute "drop table tmpMultiMDBQueryComparison" db.Execute "create table tmpMultiMDBQueryComparison " & _ "(ObjName text, LOCAL datetime, LOCALQuery memo, mdb datetime, mdbQuery memo, Newest text)" ' just in case the drop fails, and the table exists db.Execute "DELETE FROM tmpMultiMDBQueryComparison" s = "INSERT INTO tmpMultiMDBQueryComparison (ObjName) SELECT DISTINCT ObjName FROM tblMultiMDBQueryComparison" db.Execute s Set cat = New ADOX.Catalog Set localdb = CurrentProject.Connection ' Connect to current database. On Error GoTo AdoError Set remote = New ADODB.Connection remote.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=C:\PATH\DATA.mdb;" remote.Open Set cat.ActiveConnection = remote Set rst = db.OpenRecordset("tmpMultiMDBQueryComparison", dbOpenTable) On Error GoTo 0 rst.MoveFirst While (Not rst.EOF) qName = rst.Fields("ObjName") For Each q In CurrentDb.QueryDefs If q.name = qName Then rst.Edit rst.Fields("LOCALQuery").Value = q.sql rst.Fields("LOCAL").Value = q.LastUpdated rst.Update End If Next For Each v In cat.Views If v.name = qName Then Set cmd = v.Command rst.Edit rst.Fields("mdbQuery").Value = cmd.CommandText rst.Fields("mdb").Value = v.DateModified rst.Update End If Next rst.MoveNext Wend Exit Sub AdoError: i = 1 On Error Resume Next ' Enumerate Errors collection and display properties of ' each Error object (if Errors Collection is filled out) Set Errs1 = remote.Errors For Each errLoop In Errs1 With errLoop strTmp = strTmp & vbCrLf & "ADO Error # " & i & ":" strTmp = strTmp & vbCrLf & " ADO Error # " & .Number strTmp = strTmp & vbCrLf & " Description " & .Description strTmp = strTmp & vbCrLf & " Source " & .Source i = i + 1 End With Next AdoErrorLite: ' Get VB Error Object's information strTmp = strTmp & vbCrLf & "VB Error # " & Str(Err.Number) strTmp = strTmp & vbCrLf & " Generated by " & Err.Source strTmp = strTmp & vbCrLf & " Description " & Err.Description MsgBox strTmp ' Clean up gracefully without risking infinite loop in error handler On Error GoTo 0 End Sub # MS Access: Display A Subreport Even When There Are No Records Seems like a lot of people are having a problem because Access automatically hides a subreport if it contains no records. Ref: PC Review, Experts Exchange, ASPFree. After digging through the various report and widget properties, there appears to be no property that will automatically display the subreport if there are no records. The way I finally got a subreport to display was to create a query which returns all the records, and also returns "blank" rows for nonexistent records. In this example, we have three tables: Orgs, People, and Positions. Positions is a table that has OrgID and PeopleID columns, and joins the other two tables. Positions has a column "HasRecord" which is a boolean that indicates that the person has a record. Positions contains not only the related records of interest, but other records as well; HasRecord indicates that this is a record we're looking for. This query will get you the list of all the orgs, with additional columns where there is a matching record in Positions where HasRecord is true. SELECT * FROM ( SELECT OrgID FROM Orgs ) a LEFT JOIN ( SELECT OrgID, PeopleID FROM Positions WHERE HasRecord=TRUE ) b ON a.OrgID=b.OrgID  That gets you your result set, and you can then add some columns with a regular join: SELECT Name, Address, Title FROM ( SELECT * FROM ( SELECT OrgID FROM Orgs ) a LEFT JOIN ( SELECT OrgID, PeopleID FROM Positions WHERE HasRecord=TRUE ) b ON a.OrgID=b.OrgID ) c INNER JOIN People p ON c.PeopleID=p.PeopleID  ## What didn't work One commenter suggested using a UNION query to add blank rows to the result. I tried doing this: SELECT OrgID, Null FROM Orgs UNION SELECT OrgID, PeopleID FROM Positions WHERE HasRecord=TRUE  This didn't work because you'd end up with a blank line before records that exist Another possibility was to do a LEFT JOIN between Orgs and Positions, and match not only on HasRecord=TRUE but even when it's NULL. SELECT * FROM Orgs LEFT JOIN Positions ON Orgs.OrgID=Positions.OrgID WHERE HasRecord=True OR HasRecord IS NULL  That doesn't work because HasRecord, if it exists, is either True or False. So False is not included. This means if a related record exists in Positions, but is not HasRecord=True, then, an OrgID for that organization won't show up in the results. If we include "OR HasRecord=False" to the statement, we end up selecting everything, including records we don't want. It just doesn't work. # MS Access: Geocoding and Distance Reporting This is some code and controls that help you geocode addresses, and prepare a report of addresses sorted by distance from a point. It's based on the Excel Geocoding Tool, but expands on it by adding a few features, including caching of calculated locations. Addresses are stored in their own table, and are normalized a little bit, so that you don't end up geocoding the same address over. (For example, if you have 50 people at an office, that location should only be geocoded once.) The code also shows how to change the sql datasource of a report in VBA code. The code's incomplete, and you it's not a drop-in library. Integration will take some effort. There probably won't be any other "releases". [I've been fixing up the code. This original code is a mess, and there are some weird things going on because I didn't understand VBA exception handling ( http://www.zipcon.net/~bobrosen/professional/exceptions.html ).] AttachmentSize GeocodingDistanceKit.zip56.59 KB # MS Access: Inserting Blank Rows This is a way to insert empty or empty-like rows into a list of "seats" that contains not only reservations, but a number saying how many seats a group of people have. If the number is greater than the number of seats, this adds new blank rows for empty seats. Sub insertBlankRows() Dim dbs As Database, qdf As QueryDef, strSQL As String Dim rst As Recordset Set dbs = CurrentDb strSQL = "SELECT tblSeats.OrganizationId, [MaxOfSeats]-Count([OrganizationId]) AS Difference, " & _ " Count(tblSeats.OrganizationId) AS CountOfOrganizationId, Max(tblSeats.Seats) AS MaxOfSeats " & _ " FROM tblSeats GROUP BY tblSeats.OrganizationId;" Set rst = dbs.OpenRecordset("qryDifferences", dbOpenForwardOnly) While (Not rst.EOF) For i = 1 To rst!Difference insSQL = "INSERT into tblSeats (OrganizationID, LastName, FirstName) VALUES (" _ & rst!OrganizationId & ", '', '')" ' MsgBox (insSQL) dbs.Execute (insSQL) Next rst.MoveNext Wend End Sub  AttachmentSize emptyrows.jpg90.86 KB # MS Access: Inserting Records with Visual Basic and DAO This example shows you how to add records with VBA and DAO instead of with SQL queries. Sometimes, it's easier to do it this way.b (The original intent was to simultaneously create a relation between the new record and another table, but this didn't happen.)  Public Sub importFolks() Dim dbs As Database Dim rstFrom As Recordset Dim rstTo As DAO.Recordset Set dbs = CurrentDb() Set rstFrom = dbs.OpenRecordset("tmp match up list to db") Set rstTo = dbs.OpenRecordset("tblActivists", dbOpenDynaset, dbAppendOnly) rstFrom.MoveFirst a = 0 Do Until (rstFrom.EOF) rstTo.AddNew rstTo.Fields("Fname") = rstFrom.Fields("Field11") rstTo.Fields("Lname") = rstFrom.Fields("Field12") rstTo.Fields("Email") = Nz(rstFrom.Fields("email")) parts = ParsePhoneNumbers(Nz(rstFrom.Fields("Phone")), 1) rstTo.Fields("WCode") = parts(1) rstTo.Fields("WPhone") = parts(2) parts = ParsePhoneNumbers(Nz(rstFrom.Fields("FAX")), 1) rstTo.Fields("FCode") = parts(1) rstTo.Fields("Fax") = parts(2) rstTo.Fields("Cell") = rstFrom.Fields("cellNumber") rstTo.Update rstFrom.MoveNext a = a + 1 Loop End Sub  # MS Access: Inserting and Deleting Contact Items With VBA Gripe: VBA syntax is difficult. The object system is a little confusing too. It's just very hard to use. To make things even more difficult, the sample code out there is kind of *weird*. Maybe there's some good reasons for doing things their way, but, it just seems verbose, error prone, and hard to write, to me. Here's some code that is the start of a library to work with Outlook's folders. It's based on some code samples from the web, refactored into something resembling a library. The best feature is the function OLGetSubFolder, which returns a MAPI folder object for a given path. Totally useful. I don't really understand why the first folder is under folders.Item(1), but the sample code used that, so I'm calling that the root folder. Maybe there are folders above that, and this is wrong. Also featured in this code are a function to test for the existence of an object, and create folders.  Option Compare Database Public Sub test() Dim foldroot As Outlook.MAPIFolder Dim foldr As Outlook.MAPIFolder Dim newfolder As Outlook.MAPIFolder Set foldroot = OLGetRootUserFolder() Set foldr = OLGetSubFolder(foldroot, "\\Contacts") Set foldr = OLMakeFolder(foldr, "Lists") Set newfolder = OLMakeFolder(foldr, "Executive Board") Set newfolder = OLMakeFolder(foldr, "Delegates") Set newfolder = OLMakeFolder(foldr, "COPE Board") OLExportQueryToFolder newfolder, "prmCOPEBOARD" Set newfolder = OLMakeFolder(foldr, "Affiliates Offices") End Sub Public Sub OLExportQueryToFolder(folder As Outlook.MAPIFolder, query As String) Dim sFname, sLname, sEmail As String Dim dbs As Database Dim rst As Recordset Set dbs = CurrentDb Set rst = dbs.OpenRecordset(query, dbOpenForwardOnly) While Not rst.EOF If IsNull(rst!Fname) Then sFname = "" Else sFname = rst!Fname If IsNull(rst!Lname) Then sLname = "" Else sLname = rst!Lname If IsNull(rst!email) Then sEmail = "" Else sEmail = rst!email OLInsertContactItem folder, sFname, sLname, sEmail rst.MoveNext Wend End Sub Public Function OLMakeFolder(foldr As Outlook.MAPIFolder, newfolder As String) As Outlook.MAPIFolder Dim f As Outlook.MAPIFolder On Error GoTo FolderDoesNotExist FolderExists: Set f = foldr.folders(newfolder) Set OLMakeFolder = f Exit Function FolderDoesNotExist: Set f = foldr.folders.Add(newfolder) Set OLMakeFolder = f End Function ' based on http://www.programmingmsaccess.com/Samples/VBAProcs/VBAProcsToManageOutl... Public Sub OLInsertContactItem(foldr As Outlook.MAPIFolder, ByVal first As String, ByVal last As String, ByVal email As String) Dim cit1 As Outlook.ContactItem Dim citc1 As Outlook.Items Set cit1 = foldr.Items.Add(olContactItem) With cit1 .FirstName = first .LastName = last .Email1Address = email .Save End With End Sub Private Sub OLDeleteAllInFolder(MAPIFolder As Outlook.MAPIFolder) Dim c As Object Dim i As Outlook.Items Set i = MAPIFolder.Items For Each c In i c.Delete Next End Sub ' based on http://msdn2.microsoft.com/en-us/library/bb756875.aspx Private Function OLGetSubFolder(MAPIFolderRoot As Outlook.MAPIFolder, folderPath As String) As Outlook.MAPIFolder Dim returnFolder As Object Dim parts() As String Dim part Set returnFolder = MAPIFolderRoot parts = Split(folderPath, "\") For Each part In parts ' Debug.Print "-" & part & "-" If part <> "" Then Set returnFolder = returnFolder.folders.Item(part) End If Next Set OLGetSubFolder = returnFolder End Function Private Function OLGetRootUserFolder() As Outlook.MAPIFolder Dim ola1 As Outlook.Application Dim foldr As Outlook.MAPIFolder Set ola1 = CreateObject("Outlook.Application") Set OLGetRootUserFolder = ola1.GetNamespace("MAPI").folders.Item(1) End Function  # MS Access: Logging Messages Here's some code to help you log messages to a table. First, make a table called tblLog, with at least these columns: Timestamp, User, Computer, Message. (You don't need a primary key.) Set the default value of Timestamp to NOW(). Copy the following code into a code module. Also, add a reference to "Active DS Type something or other". It has the active directory functions you need to discover the username. Function StartUp() Dim dummy dummy = LogOpen() DoCmd.OpenForm "frmHidden", acNormal, , , , acHidden StartUp = Null End Function Function LogOpen() LogMessage ("User opened database.") End Function Function LogClose() LogMessage ("User closed database.") End Function Function LogMessage(Mess As String) Dim sysInfo As New ActiveDs.WinNTSystemInfo Dim UserName As String UserName = sysInfo.UserName If UserName <> "" Then Dim dbs As Database Dim rst As Recordset Set dbs = CurrentDb dbs.Execute ("INSERT INTO tblLog (User, Computer, Message) VALUES ('" & sysInfo.UserName & _ "','" & sysInfo.ComputerName & _ "','" & Mess & "')") End If LogMessage = True End Function  (StartUp looks messed up. I don't know what I'm doing. There's also a pointless temporary variable in LogMessage.) To enable startup and shutdown logging, create a macro called AutoExec, and in the macro, call the StartUp function. Then create a new form called "frmHidden", and add a hander for the Close event. In that event, call the LogClose function. Save all that. What's happening is that the frmHidden form is opened up during startup, but is hidden. Then, during shutdown, it's Close event handler is called. This is a crappy hack. Improvements are appreciated. # MS Access: Printing the Range of Data on the Page on a Report I wanted to print a report that indicated the first and last item on each page, just like a dictionary has. You know: "Azeri - Babcock", "Milk - Minder". It makes it easier to flip through printouts. This is how to do it. It will put the range in the footer. I haven't figured out how to do one in the header, which is what I originally wanted, but found too difficult to do. (There is probably a way.) First, take your report, and add an unbound field to your report. Rename it to "Range". See the picture below. Then, set up event handlers for the On Print event of each section. An explanation follows the picture. Here's my code: Option Compare Database Option Explicit Public FirstRow As String Public CurrentRow As String ' All this code fails. I may need to work out a way to put ranges on the ' pages by running this report once to fill values, and again to ' re-populate the report with ranges. Private Sub Detail_Print(Cancel As Integer, FormatCount As Integer) CurrentRow = [OrgName] If FirstRow = "" Then FirstRow = CurrentRow End If End Sub Private Sub PageFooterSection_Print(Cancel As Integer, FormatCount As Integer) [Range] = FirstRow & " to " & CurrentRow End Sub Private Sub PageHeaderSection_Print(Cancel As Integer, FormatCount As Integer) ' clear out the tracking variable FirstRow = "" End Sub  Okay, it's pretty simple. Every report is made up of parts, and Access has added a couple events to the different parts, so you can execute code while the report renders. This code keeps track of the first and current values of OrgName (the field we sort and group on). When we get to the footer, the current value now holds the last value. These two values are concatenated, and then written to the [Range] field. Putting this value at the top of the page is hard, because the top is lain out before the bottom, and I can't figure out a way to cause the top to be reformatted before the final rendering. AttachmentSize Range.jpg114.05 KB # MS Access: Quoting Strings in SQL I was having a real WTF moment with Access. I'd coded up an SQL query in access, and a string had a single quote in it, fouling up the query. The SQL was something like this: SELECT * FROM Places WHERE Name='Joe's Bar'  Obviously, I forgot to quote the string correctly. For some reason, web searches didn't really turn up much about quoting text strings in SQL statements in Access. There was a lot of code that looked like this: sql = "SELECT * FROM Places WHERE NAME='" & name & "'"  My code was like that too, because that's what everyone was doing. What's funny is that I've used paramterized queries in Java, and written some similar tools for PHP, but back in VBA, I use that broken style. Knowing the right way to do it, I googled for notes about using parameterized queries in MS Access and Jet. It looked hard. It also looked verbose, and it was a little confusing. Further searches turned up results about quoting strings, but they were kind of "not pretty": sql = "SELECT * FROM Places WHERE NAME='" & Replace(name,"'","''") & "'"  Well, at least it's explicit. Instead, here's a half-way solution that cleans up the code a bit. It's inspired by Perl::DBI's quote function, which will escape quotes and also add quotes around the string: ' Single quote a string (and escape contents) Public Function SQuote(s As String) As String SQuote = "'" & Replace(s, "'", "''") & "'" End Function ' Adds a comma, so you can create constructions like: ' SQuoteComma(foo) & SQuoteComma(bar) ' Result: 'foo''svalue','bar''svalue' Public Function SQuoteComma(s As String) As String SQuoteComma = SQuote(s) & "," End Function Public Function DQuote(s As String) As String DQuote = """" & Replace(s, """", """""") & """" End Function Public Function DQuoteComma(s As String) As String DQuoteComma = DQuote(s) & "," End Function  Now the statement looks like this: sql = "SELECT * FROM Places WHERE NAME=" & SQuote(name)  Also, if you have an INSERT statement, you can construct a comma-separated list of strings like this: sql = "INSERT INTO Places (Name,Street,City) VALUES (" & _ SQuoteComma(name) & SQuoteComma(street) & SQuote(city) & _ ")"  Even with the long function name, it's fewer characters than "'" & "'". # MS Access:Can't Add New Record to Subform A subform we were entering data into stopped working. One day it was working, the next, it was not. The problem turned out to be the datasource; the underlying query started with "select distinct". For some reason, probably because there were duplicate records in the underlying table, the query caused the form to stop accepting edits -- it became a read-only query. The solution was to set the uniqueness to "no", which removed the "distinct" from the query. Some posts on the web say as much: the record source has to be writeable, meaning it can't be a UNION, most JOINs, and DISTINCTs. # MS Excel: Cleverer Table Importer These are some functions that help you write a script to import Excel data into a SQL database. What makes this different from the Access import feature is that the data can be poorly formatted. This specific code is for the Crystal Reports export feature. Crystal exports data by converted the final output to an Excel sheet, but the sheet includes the headers and titles, as well as blank columns. In short, it's not ready to import. Additionally, the CSV export feature of Crystal spits out incomplete data, so the Excel export is the best export. So, what we need is an importer that can read data with empty columns, with a header line way down the page a few lines. This partially completed importer works by finding, then analyzing the header line for column names, and noting which column name goes with which column number. With the offsets of each column, then, loop over the table, mapping each column back to column names, and using that to create an SQL string to insert the data. We also pass in some hints about which fields to quote, and which to convert from dateserials to textual dates. This code doesn't yet have the necessary code to import the data into the table. The final version of the code will run within Access, and control an instance of Excel.  Public Sub test() Dim offsets As Dictionary Dim quotes As New Dictionary Dim row As Dictionary Dim dest As New Dictionary quotes.Add "code", "quote" quotes.Add "PaidThrough", "date" quotes.Add "Mems", "number" quotes.Add "UpdateTime", "quote" n = Format(Now(), "yyyy/mm/dd") import_goto_start ("Customer #") Set offsets = import_get_heading_offsets ' move cursor down one cell While (Application.Selection <> "") Application.ActiveCell.Offset(1, 0).Select Set row = import_get_row(offsets) dest.RemoveAll dest.Add "code", row("Customer #") dest.Add "PaidThrough", row("through") dest.Add "Mems", row("Members") dest.Add "UpdateTime", n Sql = import_build_sql("foo", dest, quotes) Debug.Print Sql Wend End Sub Public Sub import_goto_start(search As String) ' moves cursor to the first likely line of data, which is the first ' cell of the header row. Call this before anything else. r = 1 While (r < 20) c = 1 While (c < 5) With Workbooks(1).Worksheets(1) If (.Cells(r, c) = search) Then .Cells(r, c).Select Exit Sub End If End With c = c + 1 Wend r = r + 1 Wend End Sub Function import_get_heading_offsets() As Dictionary ' returns a dictionary mapping field names to column numbers Dim res As New Dictionary Dim r As Integer Dim c As Integer With Workbooks(1).Worksheets(1) c = Application.ActiveCell.Column r = Application.ActiveCell.row For col = c To 100 Heading = .Cells(r, col).Value2 If Heading <> "" Then res.Add col, Heading End If Next End With ' return that dictionary Set import_get_heading_offsets = res End Function Function import_get_row(offsets As Dictionary) As Dictionary ' returns a row of data as an associative array Dim res As New Dictionary With Workbooks(1).Worksheets(1) r = Application.ActiveCell.row ' what is the way to scan the row based on the collection's contents??? For col = 1 To 10 If offsets.Exists(col) Then res.Add offsets.Item(col), .Cells(r, col).Value2 'Debug.Print "Adding " & .Cells(r, col).Value2 & " : " & offsets.Item(col) Else 'Debug.Print "Column " & col & " ignored. " & offsets.Item(col) & " : " & .Cells(r, col).Value2 End If Next End With Set import_get_row = res End Function Function import_build_sql(table As String, data As Dictionary, quotes As Dictionary) As String ' takes an associative array as input and generates an "insert" ' for the table. the field names must match. s = "" For Each d In data If s <> "" Then s = s & ", " If (quotes(d) = "quote") Then s = s & " " & d & "='" & data(d) & "'" ElseIf (quotes(d) = "date") Then s = s & " " & d & "='" & Format(data(d), "yyyy/mm/dd") & "'" Else s = s & " " & d & "=" & data(d) End If Next s = "INSERT INTO " & table & s import_build_sql = s End Function ' PHP pseudocode ' offsets = import_get_heading_offsets() ' while( row = import_get_row(offsets) ) : ' new['field1'] = row['fieldx'] ' ... ' sql = import_build_sql('table', new) ' cn.execute sql ' endwhile  The code's a little bit dirty. VBA Dictionaries were hard to learn, because MS docs tend to have simple example code. There are a few places I wished to make more efficient. # MS Excel: Moving Cursor to the First Occurence of a String This code is part of an Excel importer project for Access. The data is kinda weird, and can't be imported via the normal importer. I'm using FunctionX's VBA for Excel tutorial as a reference. Public Sub test() import_goto_start ("Customer #") End Sub Public Sub import_goto_start(search As String) ' moves cursor to the first likely line of data, which is the first ' cell of the header row. Call this before anything else. r = 1 While (r < 20) c = "A" While (c <> "E") With Workbooks(1).Worksheets(1) If (.Range(c & r) = search) Then .Range(c & r).Select Exit Sub End If End With c = Chr(Asc(c) + 1) Wend r = r + 1 Wend End Sub  # MS Outlook and Access: Recording Bounced Email Addresses This is the start of a macro that will scan your Outlook Inbox or a subfolder named "Bounces" for bounce messages, and record such messages to an Access database. The BouncingEmails.mdb files contains a single table, named "bounces", that has a single column named "email". This code will only match qmail and the Exchange server's bounce messages. Each server has its own message format, so needs a little code for each bounce.  ' This scans the current folder and copies the bouncing email address to ' C:\DB\BouncingEmails.mdb Public Sub CopyBouncedAddressesToDatabase() Dim conn As New ADODB.Connection Dim cmd As New ADODB.Command Dim rs As New ADODB.Recordset Dim AccessConnect As String AccessConnect = "Driver={Microsoft Access Driver (*.mdb)};" & _ "Dbq=BouncingEmails.mdb;" & _ "DefaultDir=C:\DB;" & _ "Uid=Admin;Pwd=;" conn.Open AccessConnect Dim inbox, bounces As Outlook.MAPIFolder Dim mail As Variant Dim body As String Dim lines As Variant Dim address As Variant Dim addressarray As Variant Set inbox = Outlook.Application.GetNamespace("MAPI").GetDefaultFolder(olFolderInbox) On Error GoTo NoBounces Set bounces = inbox.Folders.item("Bounces") On Error GoTo 0 ct = bounces.Items.Count For i = ct To 1 Step -1 Set mail = bounces.Items(i) lines = Split(mail.body, vbCrLf, 50) If UBound(lines) > 7 Then If lines(1) = "I'm afraid I wasn't able to deliver your message to the following addresses." _ And InStr(lines(4), "@") Then ' matches qmail bounces address = Mid(lines(4), 2) address = Left(address, Len(address) - 2) conn.Execute "INSERT INTO bouncing (email) VALUES ('" & address & "')" mail.Delete ElseIf lines(0) = "Your message did not reach some or all of the intended recipients." _ And InStr(lines(7), "@") Then ' matches exchange bounces address = LTrim(lines(7)) addressarray = Split(address) address = addressarray(0) conn.Execute "INSERT INTO bouncing (email) VALUES ('" & address & "')" mail.Delete End If End If ' lines.count > 7 Next conn.Close Exit Sub ' called if the bounces folder does not exist NoBounces: Set bounces = inbox Resume Next End Sub  # MS Outlook: Remove Duplicate Contacts This is a pretty good de-duper based on the one posted to a forum. This one normalizes some data so it'll match, even if it looks different. ' http://www.hardforum.com/printthread.php?t=854485 ' by pbj75 Public Sub deleteDuplicateContacts() Dim oldcontact As ContactItem, newcontact As ContactItem, j As Integer Set myNameSpace = GetNamespace("MAPI") Set myfolder = myNameSpace.GetDefaultFolder(olFolderContacts) Set myitems = myfolder.Items myitems.Sort "[File As]", olDescending totalcount = myitems.Count j = 1 While ((j < totalcount) And (myitems(j).Class <> olContact)) j = j + 1 Wend Set oldcontact = myitems(j) For i = j + 1 To totalcount If (myitems(i).Class = olContact) Then Set newcontact = myitems(i) If ((newcontact.LastNameAndFirstName = oldcontact.LastNameAndFirstName) And _ (NormPhone(newcontact.PagerNumber) = NormPhone(oldcontact.PagerNumber)) And _ (NormPhone(newcontact.MobileTelephoneNumber) = NormPhone(oldcontact.MobileTelephoneNumber)) And _ (NormPhone(newcontact.HomeTelephoneNumber) = NormPhone(oldcontact.HomeTelephoneNumber)) And _ (NormPhone(newcontact.BusinessTelephoneNumber) = NormPhone(oldcontact.BusinessTelephoneNumber)) And _ (NormAddress(newcontact.BusinessAddress) = NormAddress(oldcontact.BusinessAddress)) And _ (newcontact.Email1Address = oldcontact.Email1Address) And _ (newcontact.HomeAddress = oldcontact.HomeAddress) And _ (newcontact.CompanyName = oldcontact.CompanyName)) Then 'use FTPSite as a flag to mark duplicates newcontact.FTPSite = "DELETEME" newcontact.Save Else newcontact.FTPSite = "" newcontact.Save End If Set oldcontact = newcontact End If Next i End Sub Public Function NormPhone(ByVal p As String) As String ' first, replace . with - p = Replace(p, ".", "-") ' second if the 4th character is "-" then change the format to (nnn) nnn-nnnn If (Mid(p, 4, 1) = "-") Then p = "(" & Mid(p, 1, 3) & ") " & Mid(p, 5) End If If (Mid(p, 5, 1) = ")" And Mid(p, 6, 1) <> " ") Then p = Mid(p, 1, 5) & " " & Mid(p, 6) End If NormPhone = p End Function Public Function NormAddress(ByVal a As String) As String a = Replace(a, "USA", "") a = Replace(a, "United States of America", "") a = RTrim(a) a = Replace(a, vbCrLf, " ") a = Replace(a, vbCr, " ") a = Replace(a, vbLf, " ") a = Replace(a, " ", " ") a = Replace(a, " ", " ") a = Replace(a, " ", " ") NormAddress = a End Function  # MS Outlook: Spamassassin Training with MIME Email (.EML) Files Here's a VBA script that I'm using to train Spamassassin from Outlook. It saves out email messages to a file server where messages are used to train the filter. The problem here is that Outlook doesn't save EML (MIME format) files. You can save messages as text, but lately, spammers have been loading messages with a lot of chaff text that looks like regular email. You can't train with that, because it might cause the filter to start mis-identifying legit email as spam. The chaff is usually in the HTML as white text, at a small font size. So the user never sees it, but the filter's supposed to see it. The partial solution is to save the messages as regular email, and .EML file, with the HTML parts intact. Spamassassin seems to have code that will treat obfuscated HTML correctly. That way, the white text is removed from the training. This code is very raw. Plenty of things to fix, like error handling, but it is working right now. The code is set up not to save out text versions of the email. To use it, go to a folder, select the spam, and run the MarkAsSpam macro. This is intended to be used by the sysadmin. I have learned that end-user spam filtering is hit and miss. Some people use spam filters to block legit email rather than unsubscribe from the messages. Sub MarkAsHam() CopyMessagesToFile ("\\mailfilter\spamassassin-ham\") End Sub Sub MarkAsSpam() CopyMessagesToFile ("\\mailfilter\spamassassin-spam\") End Sub ' Move the selected message(s) to the given folder ************************** Function CopyMessagesToFile(folderName As String) Dim myOLApp As Application Dim myNameSpace As NameSpace Dim myInbox As MAPIFolder Dim currentMessage As MailItem Dim errorReport As String Set myOLApp = CreateObject("Outlook.Application") Set myNameSpace = myOLApp.GetNamespace("MAPI") Set myInbox = myNameSpace.GetDefaultFolder(olFolderInbox) ' Figure out if the active window is a list of messages or one message ' in its own window On Error GoTo QuitIfError ' But if there's a problem, skip it Select Case myOLApp.ActiveWindow.Class ' The active window is a list of messages (folder); this means there ' might be several selected messages Case olExplorer Debug.Print "list of messages" For Each currentMessage In myOLApp.ActiveExplorer.Selection Call writeAsFile(folderName, currentMessage) Next ' The active window is a message window, meaning there will only ' be one selected message (the one in this window) Case olInspector Call writeAsFile(folderName, myOLApp.ActiveInspector.CurrentItem) ' can't handle any other kind of window; anything else will be ignored End Select QuitIfError: ' Come here if there was some kind of problem Set myOLApp = Nothing Set myNameSpace = Nothing Set myInbox = Nothing Set currentMessage = Nothing End Function Sub writeAsFile(folderName As String, item As MailItem) On Error GoTo Bail Dim x As MailItem Dim fn As String Set x = item 'Let fn = folderName & Right(x.EntryID, 64) & ".txt" 'Debug.Print "file will be " & fn 'Open fn For Output As #1 ' Print #1, "From : " & x.SenderEmailAddress ' Print #1, "To: " & x.To ' Print #1, "Subject: " & x.Subject ' Print #1, vbCrLf & vbCrLf ' Print #1, x.body Let fn = folderName & Right(x.EntryID, 64) & ".eml" Debug.Print "file will be " & fn Open fn For Output As #2 Print #2, "From : " & x.SenderEmailAddress Print #2, "To: " & x.To Print #2, "Subject: " & x.Subject Print #2, "MIME-Version: 1.0" Print #2, "Content-Type: multipart/alternative;" Print #2, " boundary = ""----=_NextPart_000_000D_01CCF6AD.D1159750""" Print #2, "Content-Language: en-us" Print #2, "" Print #2, "This is a multipart message in MIME format." Print #2, "" Print #2, "------=_NextPart_000_000D_01CCF6AD.D1159750" Print #2, "Content-Type: text/plain;" Print #2, " Charset = ""us-ascii""" Print #2, "Content-Transfer-Encoding: 7bit" Print #2, "" Print #2, item.body Print #2, "------=_NextPart_000_000D_01CCF6AD.D1159750" Print #2, "Content-Type: text/html;" Print #2, " Charset = ""UTF-8""" Print #2, "Content-Transfer-Encoding: 7-bit" Print #2, "Content-Disposition: inline" Print #2, "" Print #2, item.HTMLBody Print #2, "------=_NextPart_000_000D_01CCF6AD.D1159750--" On Error GoTo 0 Bail: Close #1 Close #2 Set item = Nothing End Sub  # MSAccess: Showing "Continue..." Conditionally at the bottom of a Section in a Report Maybe I'm missing something - but it looks like Access doesn't have this feature - to put "Continued..." or "More..." at the bottom of a section if the next section is on the next page. If it exists, please comment or email me at johnk@ the domain name of this site. I seriously hope it exists. I have this complex report that is a little non-standard - and here's how I did it. The general technique is at this other post: Printing a Repeated Section Message like "Continued" My function is this: =IIf( ([txtDetailNum]>11 and [txtDetailNum]=MaxValue([NamedDelegates],[EligibleDelegates]) and [EligibleDelegates]<21) or ([txtDetailNum]=20) or ([txtDetailNum]=40) or ([txtDetailNum]=60), "Continued...","") The MaxValue function is defined like this: Public Function MaxValue(a, b) a = Val(a) b = Val(b) If (a > b) Then MaxValue = a Else MaxValue = b End If End Function MaxValue is a lot like the traditional max() but it converts strings to numbers first, because it looks like values in Access reports might become strings. The logic to show the message works for me, but there's a bug in there. When txtDetailNum is within a range where the list ends near the bottom of the page, it should show the message, because the footer gets bumped to the next page. That logic is expressed in the first part of the expression: ([txtDetailNum]>11 and [txtDetailNum]=MaxValue([NamedDelegates],[EligibleDelegates]) and [EligibleDelegates]<21) (The MaxValue part deals with a data glitch when the number of named delegates > eligible delegates.) So the entire expression should have lines like that throughout in addition to txtDetailNum=20. It just turned out that my data didn't end in the high 30s or high 50s. A correct expression would be a bit more complex, and should use VBA. You'd need to define a function that returns true if "Continued..." should be printed. The logic would be something like this: function printContinue(txtDetailNum) { pagePosition = txtDetailNum % recordsPerPage if (pagePosition >= recordThatWouldTriggerBreak) and (txtDetailNum == lastRow) then return true else if (pagePosition == recordsPerPage-1) then return true end if return false end if }  # MSAccess: VBA CRC32 Function Here's a CRC32 function based on the work at: cCRC32. The main difference is that this is a function, and the crc32 table is not recalculated each time. If there's a way to do constant arrays, I'd like to know. I haven't found anything online. Function CRC32(str As String) Dim crc32Table(256) As Long crc32Table(0) = 0 crc32Table(1) = 1996959894 crc32Table(2) = -301047508 crc32Table(3) = -1727442502 crc32Table(4) = 124634137 crc32Table(5) = 1886057615 crc32Table(6) = -379345611 crc32Table(7) = -1637575261 crc32Table(8) = 249268274 crc32Table(9) = 2044508324 crc32Table(10) = -522852066 crc32Table(11) = -1747789432 crc32Table(12) = 162941995 crc32Table(13) = 2125561021 crc32Table(14) = -407360249 crc32Table(15) = -1866523247 crc32Table(16) = 498536548 crc32Table(17) = 1789927666 crc32Table(18) = -205950648 crc32Table(19) = -2067906082 crc32Table(20) = 450548861 crc32Table(21) = 1843258603 crc32Table(22) = -187386543 crc32Table(23) = -2083289657 crc32Table(24) = 325883990 crc32Table(25) = 1684777152 crc32Table(26) = -43845254 crc32Table(27) = -1973040660 crc32Table(28) = 335633487 crc32Table(29) = 1661365465 crc32Table(30) = -99664541 crc32Table(31) = -1928851979 crc32Table(32) = 997073096 crc32Table(33) = 1281953886 crc32Table(34) = -715111964 crc32Table(35) = -1570279054 crc32Table(36) = 1006888145 crc32Table(37) = 1258607687 crc32Table(38) = -770865667 crc32Table(39) = -1526024853 crc32Table(40) = 901097722 crc32Table(41) = 1119000684 crc32Table(42) = -608450090 crc32Table(43) = -1396901568 crc32Table(44) = 853044451 crc32Table(45) = 1172266101 crc32Table(46) = -589951537 crc32Table(47) = -1412350631 crc32Table(48) = 651767980 crc32Table(49) = 1373503546 crc32Table(50) = -925412992 crc32Table(51) = -1076862698 crc32Table(52) = 565507253 crc32Table(53) = 1454621731 crc32Table(54) = -809855591 crc32Table(55) = -1195530993 crc32Table(56) = 671266974 crc32Table(57) = 1594198024 crc32Table(58) = -972236366 crc32Table(59) = -1324619484 crc32Table(60) = 795835527 crc32Table(61) = 1483230225 crc32Table(62) = -1050600021 crc32Table(63) = -1234817731 crc32Table(64) = 1994146192 crc32Table(65) = 31158534 crc32Table(66) = -1731059524 crc32Table(67) = -271249366 crc32Table(68) = 1907459465 crc32Table(69) = 112637215 crc32Table(70) = -1614814043 crc32Table(71) = -390540237 crc32Table(72) = 2013776290 crc32Table(73) = 251722036 crc32Table(74) = -1777751922 crc32Table(75) = -519137256 crc32Table(76) = 2137656763 crc32Table(77) = 141376813 crc32Table(78) = -1855689577 crc32Table(79) = -429695999 crc32Table(80) = 1802195444 crc32Table(81) = 476864866 crc32Table(82) = -2056965928 crc32Table(83) = -228458418 crc32Table(84) = 1812370925 crc32Table(85) = 453092731 crc32Table(86) = -2113342271 crc32Table(87) = -183516073 crc32Table(88) = 1706088902 crc32Table(89) = 314042704 crc32Table(90) = -1950435094 crc32Table(91) = -54949764 crc32Table(92) = 1658658271 crc32Table(93) = 366619977 crc32Table(94) = -1932296973 crc32Table(95) = -69972891 crc32Table(96) = 1303535960 crc32Table(97) = 984961486 crc32Table(98) = -1547960204 crc32Table(99) = -725929758 crc32Table(100) = 1256170817 crc32Table(101) = 1037604311 crc32Table(102) = -1529756563 crc32Table(103) = -740887301 crc32Table(104) = 1131014506 crc32Table(105) = 879679996 crc32Table(106) = -1385723834 crc32Table(107) = -631195440 crc32Table(108) = 1141124467 crc32Table(109) = 855842277 crc32Table(110) = -1442165665 crc32Table(111) = -586318647 crc32Table(112) = 1342533948 crc32Table(113) = 654459306 crc32Table(114) = -1106571248 crc32Table(115) = -921952122 crc32Table(116) = 1466479909 crc32Table(117) = 544179635 crc32Table(118) = -1184443383 crc32Table(119) = -832445281 crc32Table(120) = 1591671054 crc32Table(121) = 702138776 crc32Table(122) = -1328506846 crc32Table(123) = -942167884 crc32Table(124) = 1504918807 crc32Table(125) = 783551873 crc32Table(126) = -1212326853 crc32Table(127) = -1061524307 crc32Table(128) = -306674912 crc32Table(129) = -1698712650 crc32Table(130) = 62317068 crc32Table(131) = 1957810842 crc32Table(132) = -355121351 crc32Table(133) = -1647151185 crc32Table(134) = 81470997 crc32Table(135) = 1943803523 crc32Table(136) = -480048366 crc32Table(137) = -1805370492 crc32Table(138) = 225274430 crc32Table(139) = 2053790376 crc32Table(140) = -468791541 crc32Table(141) = -1828061283 crc32Table(142) = 167816743 crc32Table(143) = 2097651377 crc32Table(144) = -267414716 crc32Table(145) = -2029476910 crc32Table(146) = 503444072 crc32Table(147) = 1762050814 crc32Table(148) = -144550051 crc32Table(149) = -2140837941 crc32Table(150) = 426522225 crc32Table(151) = 1852507879 crc32Table(152) = -19653770 crc32Table(153) = -1982649376 crc32Table(154) = 282753626 crc32Table(155) = 1742555852 crc32Table(156) = -105259153 crc32Table(157) = -1900089351 crc32Table(158) = 397917763 crc32Table(159) = 1622183637 crc32Table(160) = -690576408 crc32Table(161) = -1580100738 crc32Table(162) = 953729732 crc32Table(163) = 1340076626 crc32Table(164) = -776247311 crc32Table(165) = -1497606297 crc32Table(166) = 1068828381 crc32Table(167) = 1219638859 crc32Table(168) = -670225446 crc32Table(169) = -1358292148 crc32Table(170) = 906185462 crc32Table(171) = 1090812512 crc32Table(172) = -547295293 crc32Table(173) = -1469587627 crc32Table(174) = 829329135 crc32Table(175) = 1181335161 crc32Table(176) = -882789492 crc32Table(177) = -1134132454 crc32Table(178) = 628085408 crc32Table(179) = 1382605366 crc32Table(180) = -871598187 crc32Table(181) = -1156888829 crc32Table(182) = 570562233 crc32Table(183) = 1426400815 crc32Table(184) = -977650754 crc32Table(185) = -1296233688 crc32Table(186) = 733239954 crc32Table(187) = 1555261956 crc32Table(188) = -1026031705 crc32Table(189) = -1244606671 crc32Table(190) = 752459403 crc32Table(191) = 1541320221 crc32Table(192) = -1687895376 crc32Table(193) = -328994266 crc32Table(194) = 1969922972 crc32Table(195) = 40735498 crc32Table(196) = -1677130071 crc32Table(197) = -351390145 crc32Table(198) = 1913087877 crc32Table(199) = 83908371 crc32Table(200) = -1782625662 crc32Table(201) = -491226604 crc32Table(202) = 2075208622 crc32Table(203) = 213261112 crc32Table(204) = -1831694693 crc32Table(205) = -438977011 crc32Table(206) = 2094854071 crc32Table(207) = 198958881 crc32Table(208) = -2032938284 crc32Table(209) = -237706686 crc32Table(210) = 1759359992 crc32Table(211) = 534414190 crc32Table(212) = -2118248755 crc32Table(213) = -155638181 crc32Table(214) = 1873836001 crc32Table(215) = 414664567 crc32Table(216) = -2012718362 crc32Table(217) = -15766928 crc32Table(218) = 1711684554 crc32Table(219) = 285281116 crc32Table(220) = -1889165569 crc32Table(221) = -127750551 crc32Table(222) = 1634467795 crc32Table(223) = 376229701 crc32Table(224) = -1609899400 crc32Table(225) = -686959890 crc32Table(226) = 1308918612 crc32Table(227) = 956543938 crc32Table(228) = -1486412191 crc32Table(229) = -799009033 crc32Table(230) = 1231636301 crc32Table(231) = 1047427035 crc32Table(232) = -1362007478 crc32Table(233) = -640263460 crc32Table(234) = 1088359270 crc32Table(235) = 936918000 crc32Table(236) = -1447252397 crc32Table(237) = -558129467 crc32Table(238) = 1202900863 crc32Table(239) = 817233897 crc32Table(240) = -1111625188 crc32Table(241) = -893730166 crc32Table(242) = 1404277552 crc32Table(243) = 615818150 crc32Table(244) = -1160759803 crc32Table(245) = -841546093 crc32Table(246) = 1423857449 crc32Table(247) = 601450431 crc32Table(248) = -1285129682 crc32Table(249) = -1000256840 crc32Table(250) = 1567103746 crc32Table(251) = 711928724 crc32Table(252) = -1274298825 crc32Table(253) = -1022587231 crc32Table(254) = 1510334235 crc32Table(255) = 755167117 Dim crc32Result As Long crc32Result = &HFFFFFFFF Dim i As Integer Dim iLookup As Integer Dim buffer() As Byte buffer = StrConv(str, vbFromUnicode) For i = LBound(buffer) To UBound(buffer) iLookup = (crc32Result And &HFF) Xor buffer(i) crc32Result = ((crc32Result And &HFFFFFF00) \ &H100) And 16777215 ' nasty shr 8 with vb :/ crc32Result = crc32Result Xor crc32Table(iLookup) Next i CRC32 = Not (crc32Result) End Function  # Mini-HOWTO: mini_snmpd, a small snmpd for embedded systems like OpenWrt mini_snmpd is a GPL snmpd by Robert Ernst. http://members.aon.at/linuxfreak/linux/mini_snmpd.html. It doesn't include docs, so here are some starter docs. The OpenWrt version can be installed from the package manager, but it doesn't seem to include a startup script for init.d, so I'll try to whip one up here, as well. First, you'll need command line access to the device, so if it's an OpenWrt router, install the dropbear ssh server, and log in via ssh. SSH in as root. To see the help for mini_snmpd, use the -h option: mini_snmpd -h You should see: usage: mini_snmpd [options] -p, --udp-port nnn set the UDP port to bind to (161) -P, --tcp-port nnn set the TCP port to bind to (161) -c, --community nnn set the community string (public) -D, --description nnn set the system description (empty) -V, --vendor nnn set the system vendor (empty) -L, --location nnn set the system location (empty) -C, --contact nnn set the system contact (empty) -d, --disks nnn set the disks to monitor (/) -i, --interfaces nnn set the network interfaces to monitor (lo) -t, --timeout nnn set the timeout for MIB updates (1 second) -a, --auth require authentication (thus SNMP version 2c) -v, --verbose verbose syslog messages -l, --licensing print licensing info and exit -h, --help print this help and exit  The default values are in parens. On a router, you usually want to see the traffic going through the network interfaces. Here's a run of mini_snmpd that exposed those values: mini_snmpd -i eth0.1,wl0,br-lan Your command line may differ, depending on your router and network configuration. Mine was OpenWrt on a Linksys WRT54g or gl. Back on your desktop (or whatever computer will be querying for snmp stats), use the net-snmp tools to poll for data. I learned how in this tutorial, Simple SNMP with Linux, by Jason Philbrook. Run this: snmpwalk -v 1 -c public 192.168.111.1 That IP address is my router. -v means version, and -c means community name: version 1, community "public". mini_snmpd seems to ignore the community name, but it must be supplied. The output I got was: iso.3.6.1.2.1.1.1.0 = "" iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1 iso.3.6.1.2.1.1.3.0 = Timeticks: (412) 0:00:04.12 iso.3.6.1.2.1.1.4.0 = "" iso.3.6.1.2.1.1.5.0 = STRING: "OpenWrt" iso.3.6.1.2.1.1.6.0 = "" iso.3.6.1.2.1.2.1.0 = INTEGER: 3 iso.3.6.1.2.1.2.2.1.1.1 = INTEGER: 1 iso.3.6.1.2.1.2.2.1.1.2 = INTEGER: 2 iso.3.6.1.2.1.2.2.1.1.3 = INTEGER: 3 iso.3.6.1.2.1.2.2.1.2.1 = STRING: "eth0.1" iso.3.6.1.2.1.2.2.1.2.2 = STRING: "wl0" iso.3.6.1.2.1.2.2.1.2.3 = STRING: "br-lan" iso.3.6.1.2.1.2.2.1.8.1 = INTEGER: 1 iso.3.6.1.2.1.2.2.1.8.2 = INTEGER: 1 iso.3.6.1.2.1.2.2.1.8.3 = INTEGER: 1 iso.3.6.1.2.1.2.2.1.10.1 = Counter32: 1758591601 iso.3.6.1.2.1.2.2.1.10.2 = Counter32: 3436817368 iso.3.6.1.2.1.2.2.1.10.3 = Counter32: 1913312114 iso.3.6.1.2.1.2.2.1.11.1 = Counter32: 122466684 iso.3.6.1.2.1.2.2.1.11.2 = Counter32: 6121670 iso.3.6.1.2.1.2.2.1.11.3 = Counter32: 110670073 iso.3.6.1.2.1.2.2.1.13.1 = Counter32: 0 iso.3.6.1.2.1.2.2.1.13.2 = Counter32: 0 iso.3.6.1.2.1.2.2.1.13.3 = Counter32: 0 iso.3.6.1.2.1.2.2.1.14.1 = Counter32: 0 iso.3.6.1.2.1.2.2.1.14.2 = Counter32: 85 iso.3.6.1.2.1.2.2.1.14.3 = Counter32: 0 iso.3.6.1.2.1.2.2.1.16.1 = Counter32: 4073775119 iso.3.6.1.2.1.2.2.1.16.2 = Counter32: 980016090 iso.3.6.1.2.1.2.2.1.16.3 = Counter32: 2726650206 iso.3.6.1.2.1.2.2.1.17.1 = Counter32: 112229579 iso.3.6.1.2.1.2.2.1.17.2 = Counter32: 6270244 iso.3.6.1.2.1.2.2.1.17.3 = Counter32: 120445972 iso.3.6.1.2.1.2.2.1.19.1 = Counter32: 0 iso.3.6.1.2.1.2.2.1.19.2 = Counter32: 0 iso.3.6.1.2.1.2.2.1.19.3 = Counter32: 0 iso.3.6.1.2.1.2.2.1.20.1 = Counter32: 0 iso.3.6.1.2.1.2.2.1.20.2 = Counter32: 11015 iso.3.6.1.2.1.2.2.1.20.3 = Counter32: 0 iso.3.6.1.2.1.25.1.1.0 = Timeticks: (748634215) 86 days, 15:32:22.15  The first part is the object ID (OID), the second part is the data type, and the third part (after the colon) is the value. The OID is like a path to a value. What are these values? There's a database for them, and this page will show you the names for the network counters at iso.3.6.1.2.1.2.2.1 http://www.alvestrand.no/objectid/1.3.6.1.2.1.2.2.1.html That site has a ton of OIDs in it. Armed with this knowledge, you should be able to program Cacti or MRTG to extract data from your router and graph it. ## An init.d for OpenWrt The old version of OpenWrt I'm using didn't create a script in init.d for me. So here's an init.d script, /etc/init.d/mini_snmpd: #!/bin/sh /etc/rc.common # Copyright (C) 2006 OpenWrt.org START=50 start () { mini_snmpd -i eth0.1,wl0,br-lan & } stop() { killall -9 mini_snmpd }  It's based on init.d/cron You also need to run this: cd /etc/rc.d/ ln -s /etc/init.d/mini_snmpd S50mini_snmpd /etc/init.d/mini_snmp start  That creates a symlink to cause mini_snmpd to start on boot. Then it starts the daemon. You can now log out of the router. # Mobile Phone Developer Sites A couple mobile phone business and development links. One came from TechRepublic, speculating about who might buy (the newly revived) Palm. Podcast: Will the$99 smartphone trigger a price war? [Guess not. It seems to be a price war at the $199 price point.] Correcting BREW and J2ME - a 2008 article that gives background about the competing BREW and J2ME markets, and the then-emergent iPhone business model. Links to misc app stores (mobile or not): Linspire CNR, GetJar, Boost Mobile, ATT MediaMall, Sprint Software Store, Handmark, Ovi (Nokia), Android Market, T-Mobile T-Zones, Motorola Solutions. A bunch of development links after the jump. ## A Random List of Stuff Maemo from Nokia Palm Pre (Google) Android Apple iPhone Java ME Blackberry (Quakcomm) BREW Symbian OS Windows Mobile Qt Geos (defunct OS) Also, there are some higher-level application platforms. Yahoo Blueprint Plusmo Motorola WebUI - one of a few different WebKit based solutions out there. WebKit, Apple's browser engine, which is getting a lot of application features added. Ansca Corona, an iPhone SDK that uses the Lua language. GetJar mobile phone market stats: summary is, 75% goes to MIDP2, CLDC1.1. The rest is mostly Symbian. So Java still dominates, but, iPhone is the emergent platform that is leading innovation. AttachmentSize cellphone.jpg5.42 KB # Move Files into a Directory Named for the Modification Date This script is being used to move files around in a Maildir. A bunch of spam goes into the "new" directory. When this script is run, it moves the files into directories based on the mtime, into directories named for the date when the file was modified. #! /usr/bin/perl # move files into directories named by date # a file modified on 2009-07-11 will be moved into a directory named "2009-07-11".$dir = $ARGV[0];$dir = 'new' if ! $dir; opendir DH,$dir;

while ($fn = readdir DH) { next if ($fn =~ /^\./);

$filename = "$dir/$fn"; ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat($filename); ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($mtime);$mday = sprintf('%02d',$mday);$mon = sprintf('%02d',++$mon);$year = 1900 + $year;$destdir = "$year-$mon-$mday";$destname = "$destdir/$fn";

mkdir $destdir if (! -e$destdir);
rename $filename,$destname;
}


# Moving from Backups to SAN/DRBD + Archival Backup

As data storage needs increase the volume of data will slowly overwhem the ability to back up the data.

BackupExec saving 500 GB of data over a gigabit ethernet link to a RAID NAS (Buffalo terastation running Linux with soft RAID 5 and Samba) takes 15 hours. Even with this miserable performance, I'd be surprised if I could get it to improve its performance by getting a better NAS - I suspect the real bottleneck is BackupExec, which probably does a lot to prepare the backup file.

So, in the real world, I'll be running one-day backups just before I need to backup one terabyte of data.

One possible solution for this is to use a Storage Area Network (SAN). A SAN is an enterprise-level tech that federates multiple file servers to behave like a single, large file server. The system is redundant, so losing a server doesn't destroy the network. SAN is a block level technology, and the network emulates a disk. The main problem with SAN is the high price.

An emergent alternative is DRBD, a redundant network-mirrored file system that runs on Linux and BSD. It creates a block level device that is backed by a file, and that file is synchronized across the network with a remote copy.

In a typical network, the Linux or BSD system is accessed over Samba file sharing, which is a crossplatform file sharing solution that works with Windows. With some management scripts, it should be possible to create a file share that's redundant and has high availability.

This redundant system would be better than RAID. An entire machine could be removed from the system, and it would still be available.

With this higher level of reliability, it would be feasible to reduce the frequency of backups, even to the point of demoting backups to the role of archival backups. Additionally, backups could be performed against one machine, while file access for users is provided on the other machine, improving overall performance.

## Bottlenecks

Data being written to disk is limited by the disk speed, which is 7200 RPM. Even if you have gigabit ethernet and SATA2, you're limited by the speed of the disk. And even if you spend the money for 10,000 RPM disks, you're still not going to overcome this limit in significant ways. If you invest the money for 10gib copper or fiber optic, you're still limited by the 6gbit SATA3.

Real world disk performance is around 60 MB/s, or in mbits, around 500mbit/s. That's around 50% of a gigabit ethernet link, and around 5x as fast as a 100mbit ethernet link. RAID further degrades disk performance on writes.

So, even if you remove the network and disk interface bottlenecks, you're going to be limited to 60Mb/s, and that's before you consider other performance hits related to writing data. The fastest possible backup of one terabyte of data is 4 hours.

Real world performance is significantly worse as noted above.

# MySQL Optimization

Here's a noob-to-noob optimization trick. Suppose you have a database table with, say, 200,000 records, and you regularly select on multiple criteria. The rule for selection is to put the most specific WHERE clause first, and the least specific last. The goal is to cut down the search set to something small, and then search through the smaller set. Get all the queries using this order, then create a composite index over the keys to speed up the search even more.

Here are some before and after shots, based on real queries (from sf-active):

select * from tb where display='t' and parent_id=0 and id > 198000 limit 0,30


After:

select * from tb where id > 198000 and parent_id=0 and display='t' limit 0,30


This revision will now cause the first clause to eliminate most of the rows from the table, leaving only around 2,000 rows to scan. The second clause, parent_id, eliminates 50% of the remainder. Display='t' is the least selective clause.

Also, it wasn't noted, but there are already indexes for display and parent_id. So we aren't starting with absolutely nothing.

select * from tb where display='t' and parent_id=0 limit 0,30


After:

select * from tb where parent_id=0 and display='t' limit 0,30


Also do this:

alter table tb add index (parent_id, display)


That looks virtually identical. Again, this is a real-world situation, where the query was built-up dynamically. The optimization here is that I created an index that will speed up the select. The index matches the order of the query, so the query optimizer will be able to find the optimization easily.

Additionally, it would be a good thing to put all the clauses in all the queries into this order, from most specific to least specific, to gain the maximum optimization. I suspect the query optimizer already does this automatically, but, being meticulous about this seems like good mental discipline.

The real-world effect of this simple optimization, which took around two hours to complete, was dramatic. The slow query had been bogging down the server, with queries taking thousands of seconds to execute (or in our situation, to time-out, and require the admin to go in and kill the thread). Now, the query barely shows up in the process list, and the real-world speed feels like it takes less than five seconds to execute through the web (meaning, it includes dns lookup, tcp connection, and page rendering). Typically, it takes one second, and feels pretty fast.

# NVU - text fields for copy-and-paste

Here's how to create one of those text fields with HTML that the user's supposed to copy-and-paste into their page. It's not hard.

Create a form.

Add a Text Area. Give it a name, and set the rows and columns.

From the text area dialog, click on the "Advanced Properties..."

Click on the Javascript tab.

Add a property named "onclick" with the value of "this.focus(); this.select();".

Click OK.

Click OK.

NOTE: I found a serious problem - NVU's code reformatting will cause the html code to break within myspace, because NVU inserts newlines. To fix the problem, you have to save out the source, join all the lines, and upload the file manually.

# NVU Installer for Ubuntu Feisty Fawn

For some reason or other, they don't have NVU for Ubuntu 7. You can install it from the .deb file. Instructions are at:

http://linuxdesktopsoftware.com/2007/04/nvu_install_on_ubuntu_feisty.html

# NVU to Create HTML Code to Insert into Websites

NVU is a free (as in GPL) Hypertext Markup Language (HTML) editor available at nvu.com.

You can use NVU is to generate bits of HTML code that you paste into a web form, like you do to update your profile, or to post an article. This can be done on any site that allows you to post in HTML.

## Step by Step

1. Start up NVU.

3. When you're ready to post, click on the "SOURCE" tab along the bottom of the editing area. You should see a screen with your text, surrounded and interspersed with text that looks like <this>. Those things are "HTML Tags".

Your goal is to extract some tags and text. To do this, first try to find the first <BODY> tag. It should be at around line 8 (the lines are numbered along the left edge of the editing area).

Then locate the closing (final) body tag, around one line before the end of the page. It should look like "</body>". The slash in front indicates it's a closing tag. (The other body tag was the opening tag.)

What you have to do is copy all the code between the body tags, but not including the body tags.

4. Most people click and drag to select the codes, then right-click and copy.

An alternative way is to click before the first character to select, then hold shift and click after the last character to select. Then, press control-C to copy.

5. Paste the code in your clipboard into the web form.

## Images

The best way to incorporate images into your posts is to first, prepare your images, and upload them to a server. This can be an image server, or your own server, or an image-host. (On Indymedia sites, you can see if they let you upload images in a batch, then post the story afterward.)

By putting them on a server first, you are giving each image a permanent, public location. That location is the image's URL. Then, you can include that image in the HTML. That's how HTML works - images are references to the image files.

Please note that this is completely different from MS Word. In Word and other word processors, images are included into the document. The image data becomes part of the document, and the image is a copy of the original file. This is a subtle but important difference. Please unlearn any preconceptions you might have learned from MS Word. Also, don't use MS Word to create your HTML. NVU is a better tool.

So, let's assume you've put the images onto a server somewhere, and know the URLs.

To place the image in your document:

1. Click where you want the image to be, and click the Image icon in the toolbar.

2. Type in the URL to the image in the Location: field.

If you are lazy like me, you just open up the image in your browser, then copy the URL from the address bar, and paste it into the Location field.

3. Type in a description into the Alternative Text box, or click "Don't use alternate text."

4. Click OK.

# Net Art, Electronic Art, Computer Art, Web Browser Art, & Hybrids

This is a list of various art projects. People put their portfolios online. I apologize to anyone not listed, as this list's been started in 2009, and there's been a lot of interesting stuff going on for a couple decades. This is a personal list, not a comprehensive list.

Rhizome Art Base

Here's one possible reason:

http://support.microsoft.com/default.aspx?scid=kb;en-us;290684

Here's a fix:

http://www.annoyances.org/exec/forum/winxp/t1130775940

# PC Hardware Failures

I'm noticing some patterns in PC failure. Here they are.

Batteries fail, causing date errors, or worse, booting problems. These fail after 2-3 years, and can be replaced easily for around $10. Hard drives fail after around 5 years, causing much pain. Laptop drives can fail after just a year, and tend to develop bad sectors due to mishaps with the laptop. For maximum happiness, replace the drives before they fail, and use the originals as archive drives. Motherboards sometimes fail, but not on any predictable schedule. Motherboards can fail if the capacitors dry out or start to bulge and explode. This is more common than it should be, but at the same time, all caps tend to fail after years of use. (I've also seen small ethernet switches fail due to bad capacitors.) Floppy drives occasionally fail, but, more often, the floppies fail. They last around 3 years, and then some stop working. Power supplies fail. In PCs, the power supply is often the culprit when a computer doesn't work. Good PSs last for many years, but the stock ones often fail after around 3-5 years. End users can replace these. The small switched power adapters used with hard drives and laptops fail, a lot. Most of the time, the problem is that the cable is bent and the wires within are broken. More intensive use, like in a server room, tends to lead to the adapters expiring from overwork. The real fix is to buy gear that has a big power supply with a fan. Fans fail. These things spin until they start to rattle. They're cheap and easy to replace. Monitors fail, but it's usually the power supply that goes out. If it's not that, then, the it's a goner. Mice clog up. Use laser mice. Keyboards get crumbs in them, and they lose keys. Some have intermittent electrical problems that lead to weird typing problems. These can be cleared up by disconnecting and reconnecting the ribbon cable connecting the keys to the controller. # Parity in Computer Data, What is It? Parity, in computer data, is a bit that's set or unset so the total number of bits is either even or odd. It's an extra bit, and it's added as a check on the data. So, if the parity is not correct, you assume the data is bad. It's often used in data communications, and was a very visible feature during the old modem and BBS days. Even parity means that a bit is added so the total number of "1" bits is even. Odd parity means that a bit is added so the total number of "1" bits is odd. So this: 1010100 With even parity, is: 10101001 With odd parity is: 10101000 Back in the old days, ASCII had only 7 bits. (Indeed, it actually has only 7 bits, but the 256 and larger character sets have dominated.) The 8th bit was used for parity. Then, by the 1970s, modems with 8 data bits and a 9th parity bit were common. The shorthand terminology that describes the number of parity and data bits, as well as stop bits, is still pretty common. A stop bit is an added bit that's low. It's like a pause. Here are some examples: 8N1 - 8 data bits, no parity, 1 stop bit. 7E1 - 7 data bits, even parity, 1 stop bit. 8O1 - 7 data bits, odd parity, 1 stop bit. Parity bits are overhead, but help detect problems with the data. Of course, if the parity bit is also flipped, or two bits are flipped, the error won't be detected. That, however, is rare, because noisy connections tend to have a lot of errors. Also, for larger file transfers, techniques like XMODEM and Kermit were invented. They would send the data, and then calculate a checksum. If the checksum failed, then the entire block of data was bad and a resend could be requested. Ethernet uses parity, as well as checksums. RAID 5 uses parity, but in a different way. It uses any number of data bits, and one parity bit. RAID 6 is like RAID 5, but adds another parity bit. This way, you can lose two drives and still recover. Parity is sometimes misspelled "parody" by people who have poor spelling. Computer people may do the opposite, and spell parody as "parity" because they don't see the word "parody" anywhere. The two words are pronounced similarly. # Payment Card Industry Data Security Standard (PCI DSS), getting with the program. These are notes for achieving conformance with PCI DSS. PCI DSS is a bit of private-market bureacracy that basically amounts to an agreement to use secure practices, and to implement a system with security enabled, and unsecure services and features disabled. The website was heavy on bureacracy and the technical info was hard to find. First, you need to get the PCI DSS standard, v.2.0. It's a PDF download. Next, get nmap on your server and your desktop. You have to scan the server over and over. With nmap, do this: nmap -A -T4 myserver.com You'll get output like this: Nmap scan report for myserver.com (0.0.0.0) Host is up (0.072s latency). rDNS record for 0.0.0.0 myserver.com Not shown: 927 filtered ports, 64 closed ports PORT STATE SERVICE VERSION 80/tcp open http Apache httpd 2.2.17 ((FreeBSD) mod_ssl/2.2.17 OpenSSL/1.0.0d PHP/5.3.6 with Suhosin-Patch) |_html-title: 403 Forbidden 110/tcp open pop3 Courier pop3d |_pop3-capabilities: USER STLS IMPLEMENTATION(Courier Mail Server) UIDL PIPELINING LOGIN-DELAY(10) TOP OK(K Here s what I can do) 143/tcp open imap Courier Imapd (released 2011) |_imap-capabilities: THREAD=ORDEREDSUBJECT QUOTA STARTTLS THREAD=REFERENCES UIDPLUS ACL2=UNION SORT ACL IMAP4rev1 IDLE NAMESPACE CHILDREN 443/tcp open ssl/http Apache httpd 2.2.17 ((FreeBSD) mod_ssl/2.2.17 OpenSSL/1.0.0d PHP/5.3.6 with Suhosin-Patch) |_sslv2: server still supports SSLv2 |_html-title: Site doesn't have a title (text/html; charset=iso-8859-1). 465/tcp open ssl/smtp qmail smtpd |_sslv2: server still supports SSLv2 | smtp-commands: EHLO c.slaptech.net, AUTH LOGIN CRAM-MD5 PLAIN, AUTH=LOGIN CRAM-MD5 PLAIN, STARTTLS, PIPELINING, 8BITMIME |_HELP qmail home page: http://pobox.com/~djb/qmail.html 993/tcp open ssl/imap Courier Imapd (released 2011) |_imap-capabilities: THREAD=ORDEREDSUBJECT QUOTA AUTH=PLAIN THREAD=REFERENCES UIDPLUS ACL2=UNION SORT ACL IMAP4rev1 IDLE NAMESPACE CHILDREN 995/tcp open ssl/pop3 Courier pop3d |_pop3-capabilities: USER IMPLEMENTATION(Courier Mail Server) UIDL PIPELINING OK(K Here s what I can do) TOP LOGIN-DELAY(10) 8000/tcp open http Icecast streaming media server |_html-title: Icecast Streaming Media Server Service Info: OSs: Unix, FreeBSD  My first goal is to get rid of the SSLv2 warning. Some websites said this was a PCI violation. To do this, first read the mod_ssl docs. Then, you need to alter the configuration file a bit. My file was /usr/local/etc/apache22/extras/httpd-ssl.conf. I added this line to the global config: SSLProtocol ALL -SSLv2 That enables all but the SSLv2 protocol, which is the oldest protocol and is considered insecure. The newer ones are SSLv3 and TLSv1. Also, alter the ciphers. Look for the line SSLCipherSuite line and change to: SSLCipherSuite TLSv1:IDEA:SHA1:HIGH:-LOW:-MEDIUM I'm not sure I have that right, but it's mostly about enabling TLSv1, and disabling the LOW and MEDIUM grade ciphers. "TLSv1" above is an alias for a number of different ciphers. See the SSLCipherSuite section in the mod_ssl docs for more information -- it's too complex to describe here. But, in short, negotiating an SSL connection involves several phases, and in each phase, you can use different ciphers. Some are considered stronger than others. Exchanging data with these ciphers requires that both the client and the server have the required programs to handle the ciphers. That's why there are choices -- the programs will try to work with what they've got, and also try to use the most secure ciphers. Your job is to disable the less secure protocol, SSLv2, and not include the less secure ciphers. Read the mod_ssl docs for more details and info on how to list available ciphers. Next, you have to establish a new virtual server for the web store. This requires creating a new Apache conf file, using this default file as a template. The main thing about making an SSL site is getting those certificates, putting them in a safe place, setting the permissions, and getting the server to come up. Just for starters, get a certificate from CAcert.org or make a self-signed certificate. You can "upgrade" to a commercial certificate after you've configured the server correctly. But, before you can do that, you need to allocate an IP address for the website. This is a limitation of Apache and OpenSSL, at this time. Until recently, there was no way to run name-based virtual hosts with SSL; the problem was that SSL was negotiated before the hostname was sent to the server, so you could only have one certificate per IP address. Today, there's a feature called server name identification (SNI) that allows it. Read about gnutls and SNI and Apache with SNI. Also read Wikipedia on SNI - it indicates that any verision of IE on Windows XP does not support SNI. Therefore we can't use SNI on the server. We must use IP addresses for vhosting. Lock down the default virtual host. (I'm not sure if it complies with the export laws as stated in the agreement, but it probably does.) # Perl Watchdog Script for Apache This is a rough watchdog script to restart apache on the local machine when the website gets slow. If a GET to the url fails, or takes longer than 60 seconds, the local web server is restarted. I started to use this after installing a new version of Apache. The system hadn't been properly tuned, and the side effect was that Apache would be nearly wedged, but the rest of the system was merely slow. This happens when Apache or MySQL are getting wedged, but it's not due to the system overloading with too much traffic. Why does it happen? It's hard to say - it could be a configuration problem, a DOS attack, a hack, or a software error. Regardless, it's more important to keep the system responsive, so the service gets restarted. This script needs some more logging so it can snapshot different system stats, like system load, memory, network connections, processes, etc. before it's really useful for figuring out why the server is unresponsive. #! /usr/bin/perl our$LOGFILE = '/var/log/watchapache.log';
our $URL = 'http://your.url.here/'; our$APACHE_RESTART = '/etc/init.d/apache22 restart';
our $SLEEP = 600; ###################################### require WWW::Mechanize; require Time::Progress; require POSIX; our$mech = WWW::Mechanize->new( onerror => \&failed );
$mech->stack_depth(0); our$p = new Time::Progress;

sub test_server() {
$p->restart;$mech->get( $URL ); my$elapsed = $p->elapsed; write_log($mech->status() . " elapsed $elapsed"); if ($elapsed > 60) {
restart_apache();
}
}

sub failed() {
write_log( "Failed " . $mech->status() ); if ($mech->status() eq '500') {
## assume it's a dead server
restart_apache();
}
}

sub restart_apache() {
write_log("restarting apache");
system( $APACHE_RESTART ); } sub write_log($) {
my $line = shift @_; if (-e$LOGFILE) {
open FH, '>>', $LOGFILE;; } else { open FH, '>',$LOGFILE;
}
print FH POSIX::strftime('%D %T', localtime);
print FH " $line\n"; close FH; } for(;;) { test_server(); sleep($SLEEP );
}


# Phone Number to Call that Repeats the Line's Number to You

The telephone number that will tell you the number you're calling from is called an Automatic Number Announcement Circuit, or ANAC.

They're useful if you're trying to identify the number associated with a dial tone.

It's important to label ALL the jacks correctly, and use a scheme like A1 for analog phones, D1 for digital phones, and L1 for LAN (Ethernet) jacks. When you're testing jacks, start off by plugging in the highest-voltage, highest current device first. That's usually the PBX phones, which use 2 wires, 48V and enough current to power the chips inside.

Then, after that, it's the regular POTS telephones, which ring at 90+ volts, get powered at 48V, and is < 10V when it's off hook.

Last is Ethernet, which runs between 2V and 5V. If you're testing Ethernet with a computer, consider using an external adapter which you can sacrifice.

Generally, there's no hazard plugging a POTS phone into a PBX, but you might hear weird noises.

If you see a "harmonica", it's probably plugged into a RJ21, 50-pin "centronics" type plug. These will do 25 phones, at 2 wires per phone, or 12 with 4 wires per phone. These terminate at the "harmonica" or sometimes at a patch panel.

Computer printers suck. It's almost impossible to tell if you're going to get a good one, or a big dud. Generally, the good ones are expensive, and the losers are cheap. Some brands are better than others, but the models within a brand vary more than the models across a brand. There are good Brothers, and there are crappy HPs, even though people generally think of the Brother as inferior to HP.

Best bet is to buy at the midrange, for products aimed at small offices. Products at the low end aimed at the home market won't last. Also, there are sometimes some great bargains - but that could be because of design flaws.

Example: Samsung ML-2510. Works great when new, but it seems like a design flaw cause the unit to get hot, and the rubber parts to wear out quickly, necessitating cleaning and possible refurb.

On the other hand, I had a small Panasonic laser printer that lasted for years and printed thousands of pages. I eventually gave it away because it was a Mac printer and I went to the PC and Linux. The print quality was middling, and speed was slow, but the machinery was solid.

All new printers work great. Not all printers work after three years of steady use, but some will.

The old HP 4000 series and the 4 and 5 series printers were and are awesome. They easily last a decade, and require only one or two roller repairs in that time. These were office-grade printers with few features, and priced in the $800 range. You can find them for <$100 now. Only problem is that they require parallel printer ports or an ethernet card.

HP quality is variable - it's kind of like cars. Some years, they're good, other years, not so goo. Midrange HPs sell on the used market for $200 -$400 and are easy to evaluate - read the reviews.

## Impulse purchase and e-commerce – Online Consumer Behaviors ***

https://www.msu.edu/~liulian/files/TC862.doc

This is an older paper from 2001 or 2002. 40% of online purchases are unplanned. 75% of buyers stated that the purchase was price-driven. Analysis uses the Consumption Impulse Formation and Enactment model (CIFE) by Utpal M. Daholakia, a model to understand impulse purchases. To create the ideal environment: category links, simple checkout, recommendation system, virtual checkout, product exposure, highlight feature products, bundles.

## On the Negative Effects of E-Commerce: A Sociocognitive Exploration of Unregulated On-line Buying ***

http://jcmc.indiana.edu/vol6/issue3/larose.html

An older paper from around 2001 by Dr. Robert LaRose, who specializes in media and telecommunications. This doesn't discuss ecommerce as much as psychological factors like addiction, compulsive behavior, and shopping. The ecommerce part seems dated.

## What causes customers to buy on impulse? **

2002 paper by User Interface Engineering. I think it's results are distorted by the fact the participants were given money. Shoppers think of things to buy as they shop. 87% of money spent on impulse purchases resulted from category navigation. The other 13% were from using search. Site searches narrow focus too much. Well designed navigation exposes customers to more products, resulting in more impulse buys.

## Zara.com 2% conversion rate *

Zara.com has achieved a 2% conversion rate (meaning that 2% of people who click to that site via a paid ad purchase something). Industry average is 1%.

## Some other pages and papers I haven't had time to read and summarize

http://www.ebrc.fi/kuvat/23-35_04.pdf

http://jcmc.indiana.edu/vol10/issue1/kim_larose.html (this looks difficult, but has lots of data)

http://www.wharton.universia.net/index.cfm?fa=viewArticle&id=1642&language=english

http://www.ebrc.info/kuvat/2056_04p.pdf

# Resolve IP Addresses to DNS Names

Sometimes, you have textual data, like log files, with IP addresses. You sometimes want this data to show hostnames instead.

This script converts IP addresses in the standard input to hostnames. (Script is based on one I found in perlmonks.org.)


#!/usr/bin/perl -w
#
# Resolve IP addresses in web logs.
# Diego Zamboni, Feb 7, 2000
# John Kawakami, May 12, 2008

use Socket;

# Local domain mame
$localdomain = 'slaptech.net'; while (my$l = <>) {
if ($l =~ /^(.*?)(\d+\.\d+\.\d+\.\d+)(.*?)$/) {
$pre =$1;
$address =$2;
$post =$3;
if ($cache{$address}) {
$addr =$cache{$address}; } else {$addr=inet_aton($address); if ($addr) {
$name=gethostbyaddr($addr, AF_INET);
if ($name) { # NOTE: To ensure the veracity of$name, we really
# would need to do a gethostbyname on it and compare
# the result with the original $f[0], to prevent # someone spoofing us with false DNS information. # See the comments below. For this application, # we don't care too much, so we don't do this. # Fix local names if ($name !~ /\./) {
$name.=$localdomain;
}
$cache{$address}=$name;$addr=$name; } else {$addr = $address; } } else {$addr = $address; } } # print$pre.'-'.$addr.'-'.$post."\n";
print $pre.$addr.$post."\n"; } else { print$_;
}
}


To use it, save the code into the file "resolve", do a "chmod u+x resolve" on it, and then try the following:

last -10 | ./resolve


# Risks of Web Services to Applications

Increasingly, applications are dependent on external web services. Web services are great - you can get current data on demand, inexpensively (relatively) because we can purchase it in small increments. Web services are typically not only data services, but also perform data processing functions.

Web services also represent a risk, because the services can be discontinued or change.

This is a diagram of a typical application that integrates web services with local data.

Suppose one web service goes away. Any features of the application that depend on the web service stop working. Either an error message will be thrown, or the features will lack data, or the data will be obsolete.

Suppose two services fail. More of the application will fail.

If there are dependencies between the web services - i.e., the application is a mash-up that combines data from two web services - then the failure of either service affects the other.

To relieve some of this risk, one uses caching. You create a local database that will hold copies of the remote data from the web service. Requests are routed to the cache, and if the data isn't present or is outdated, the request is forwarded to the web service.

The cache handles service outages well, but it doesn't handle changes to the services. The two most common changes are upgrades to the service, and changes to the company that provides the service.

Upgrades typically don't cause existing services to become discontinued. Companies will maintain the existing web service with some kind of adapter. However, as time marches on, the legacy systems will eventually be discontinued when few customers use it. I don't know what the lifecycle for a legacy service is, but it probably depends on the company providing the service. Government agencies seem to support data formats for longer than ten years.

Startup companies seem to last around three years - and when the remains of a company are merged with another company, it's rare that legacy systems are maintained as-is (they only want the customers, not their extant systems).

Major changes to services will require changes to the application. Either an inexpensive shim will be developed to adapt the new data to the old cache system... or the cache system and the application's access to the cache will need to be rewritten.

If the service vanishes, then it will be necessary to find a replacement service.

The lifecycle of business software is growing. In the 1990s, it would have been reasonable to expect software to become obsolete in five years. Today, it's common to run software that's nearly a decade old. In finance, utilities, government, and other slow-changing institutions, software lifecycles are measured in decades.

So, it should be expected that all software lifecycles grow longer as the software becomes institutionalized.

Exposure to the risk of web services changing increases with the length of the software lifecycle. The damage that change inflicts tends to grow geometrically if the web services are integrated together. Thus, the effect of changes will have a geometrically negative effect.

The local cache, and its behavior, are the only insurance against the inevitablity of changing web services.

AttachmentSize
appdiagram.png25.62 KB

# Screen Scraping With wget (and Mailarchiva)

I was testing a new product called Mailarchiva, and I misunderstood the instructions. The upshot was that a mailbox full of messages was moved into Mailarchiva, and I wanted to restore them to the mailbox.

Mailarchiva comes with a tool to decrypt its message store, but it didn't work. The problem was that the main product and the utility package got de-syncrhonized, and the one tool I needed stopped working (because a method's type signature changed). Also, despite being an open source project, they didn't have sources for the utilities up on sf.net, so I couldn't re-build the program to make it work.

Not being a major java programmer, I had a hard time coaxing the system to the point where it would run without an exception - problem was, the utility's libraries expected one format for the message store, and the server's expected another. It was getting really difficult.

I had some manually produced backups, but not of the current month. (I didn't follow my own advice not to test with live data.)

You just can win, sometimes.

The solution, sort of, was to use the website dowloader, wget, to interact with the app via it's web interface, and use that to download the messages to files. Screen scraping.

First, I found a page with great examples:
http://drupal.org/node/118759#comment-286253

Then, a quick visit to the wget man page:
http://www.gnu.org/software/wget/manual/html_node/Types-of-Files.html#Types-of-Files

Here's the short version of how to do it:

The second step is to figure out how to download the messages.

The third is to figure out the range of pages in the results, and then write a loop to recursively download the messages from each set.

Then, finally, copy the .EML files up to the server via Outlook Express.

Here's the long version:

First, you have to submit a web form, and get a session id in a cookie. Here's the command I used:

wget -S --post-data='j_username=admin&j_password=fakepass' http://192.168.1.103:8090/mailarchiva/j_security_check


192.168.1.103 is the IP address of my test installation.

The --post-data line lets you submit the login form, as if you were typing it in and submitting it. To find the URL to submit, you look at the source of the login form.

Then, you inspect the output, looking for the Cookie. Then, concot a longer, more complex command to submit the search form:

wget --header="Cookie: JSESSIONID=62141726A04B7C8BDE24C32514EB19F3; Path=/mailarchiva" --post-data='criteria[0].field=subject&criteria[0].method=all&criteria[0].query=&dateType=sentdate&after=1/1/09 1:00 AM&before=12/18/09 11:59 PM&submit.search=Search' http://192.168.1.103:8090/mailarchiva/search.do


Note that we're passing the cookie back.

Inspecting the resultant file will reveal that the search worked!

wget -r -l 2 -A "*viewmail.do*" -A "*downloadmessage.do*" -R "signoff.do" -R "search.do" -R "configurationform.do" --header="Cookie: JSESSIONID=62141726A04B7C8BDE24C32514EB19F3; Path=/mailarchiva" --post-data='criteria[0].field=subject&criteria[0].method=all&criteria[0].query=&dateType=sentdate&after=1/1/09 1:00 AM&before=12/18/09 11:59 PM&submit.search=Search' http://192.168.1.103:8090/mailarchiva/search.do


That pretty much does what I want, but, I need to do it for a bunch of pages. The quick solution is to use the browser to find out what the last message is, and then write the following shell script:

for i in 1 2 3 4 5 ; do
wget -r -l 2 -A '*viewmail.do*' -A '*downloadmessage.do*' -R 'signoff.do' -R 'configurationform.do' --header='Cookie: JSESSIONID=62141726A04B7C8BDE24C32514EB19F3; Path=/mailarchiva' --post-data='criteria[0].field=subject&criteria[0].method=all&criteria[0].query=&dateType=sentdate&after=1/1/01 1:00 AM&before=12/18/09 11:59 PM&page='$i http://192.168.1.103:8090/mailarchiva/search.do done  Note that a parameter was added to the post. It's page. A parameter was also removed, the submit value. Submitting the old value seemed to prevent the paging. There's probably a branch in the code based on the type of "submit" you're sending, because there are a few different buttons, with different effects. Again, that's discovered by reading the sources and experimenting. So, I ran the script and waited a long time. Then, I shared the data via Samba (I coded this on a Linux box, but ran the application on Windows). A nice side effect was that the shared files displayed DOS 8.3 filenames. So, the messages, which were originally named "blah.do?id=21341342334.eml" became "BADJFU~5.EML". To upload, I used Outlook Express. Despite its bad reputation, OE is good at interacting with IMAP mailboxes, and its support for the .EML file format seems to be good. Wget saved the day (but it was a long day). Lesson learned or, "lessons refreshed" is really what happened. I should have set up a test account, put mail into it, then archived it. Additionally, I should remember that when dealing with "enterprise" software, it's not going to work like Windows or Mac (or even Linux) software. Larger businesses are assumed to have certain processes that SOHO businesses don't. This would be a perfect application for a web service. It would avoid all the program execution problems. Instead of accessing the data through command line application, access it over the network, using a simple interface. Additionally, this kludgy rescue would have been impossible if the application had been written to use a Swing GUI or a native GUI. The web interface made it possible to scrape the data out of the system. As for Mailarchiva - if you are trying to archive your own mail server, it seems to be a good product. The docs could use some work :) I found others, but Mailarchiva running on a Linux box would probably be the most stable solution. The bad news is that it's not intended for archiving personal email accounts like Gmail, AOL and ISP accounts. So, it wasn't the right tool for me. What I really need is a free/cheap archiver for products like Gmail. It would both mirror and archive the IMAP folders, but allow the user to hold on to emails for as long as they wanted. So far, what I've found either doesn't do folders, or doesn't do archiving. Archiving is just saving every single email it sees, and retaining messages even if they're deleted. # Share a Printer from XP Pro to Vista Home It's pretty difficult to share from an XP Professional machine participating in a domain to a cheap laptop running Vista Home (Basic). There are a lot of things to do or the entire system won't work. Windows has a lot of granular security that can trip you up. * Make sure the printer is shared on XP Pro. From XP Pro, go to \\machinename and see that the printer is shared. * Set the Vista Home (or Windows 7 Home) laptop's workgroup to the same name as the domain. * Turn on network discovery and sharing on Vista or 7. (This may not matter - but it can help you spot problems like nonexistent computers on the network.) * Make sure that any firewall on the XP Pro machine allows Windows file and print sharing through. * Make sure that you can ping from Vista Home to the XP Pro machine. Laptops connecting via WiFi may not be on the same network! * If access is anonymous, make sure the machine's Guest account (not a domain account) is active. If access is with a username (the more common situation) make usernames on the machine that match Vista's usernames. You can set these with the same password. You create these accounts in Control Panel -> Users. * Make sure that access from the network is allowed. This is set in Control Panel -> Administrative Tools -> Local Security Policy. Look in Local Policies -> User Rights Assignment. * You may need to get drivers for Vista or Windows 7. When you install the printer, you should use one of the "universal" or "global" print drivers offered by vendors. Once you have access, double click on the printer icon and the drivers should install. When completed, the icon should appear in your printers. Print a test page. # Shared Memory Example Here's one for the noobs (from a noob). This demonstrates the use of shared memory. It's a program that spawns 10 children, and each one gets a special "babytalk" word to say. Each waits a random amount of time, and then writes it word into shared memory. Each child loops forever. The parent loops forever, and every two seconds, prints whatever is in shared memory. The last child to write to memory "wins" and is "heard" by the parent. Shared memory is a file that's treated like memory (or memory that happens to be written to a file). The filename is the name of the memory. You use mmap() like you would use malloc(). This is useful because you can duplicate a data structure across processes. (I'm thinking of using it for a kind of "scoreboard" where child processes write their results into shared memory.) Here's the code: #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/mman.h> #include <stdio.h> #include <string.h> #include <errno.h> struct d { char word[20]; }; char babytalk[10][6] = { "baa", "boo", "waaah", "urp", "eep", "naah", "yeee", "coo", "guh", "ooh" }; void child_babble(struct d *shared, char *word) { srand(word); for(;1;) { strncpy(shared->word,word,strlen(word)+1); sleep( rand() & 0x5 ); } } void main() { struct d *shared; int fd; // create and size shared memory fd = open("/tmp/sharedmem", O_CREAT|O_TRUNC|O_RDWR, 0666); printf("fd: %d\n", fd); lseek(fd,sizeof(struct d)-1,SEEK_SET); write(fd,"\0",1); // turn the file into shared memory shared = mmap( NULL, sizeof(struct d), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (shared==MAP_FAILED) { printf("ERROR: %d\n", errno); } printf("shared: %d\n", (unsigned int) shared); strncpy(shared->word,"INIT",5); // spawn 10 children int i; for(i=0;i<10;i++) { if (fork()==0) child_babble( shared, babytalk[i] ); } for(;1;) { printf("%s\n", shared->word ); sleep( 2 ); } }  AttachmentSize mem.c1.15 KB # Simple Templating Language in PHP A few years back, there was a trend in the PHP community to make alternative templating languages that ran inside PHP. This was so the designers could create HTML templates, and include bits of code to display data. The best was probably Smarty. After a while with this, a counter-trend emerged, of rejecting adding yet-another-language to the system. After all, PHP was a templating language. Some web frameworks used PHP as the templating language, but simply asked that only a tiny subset of the syntax be used. CodeIgniter and Savant did this. (So did the never released Slaptech code generator.) I was firmly in this latter camp. There are already too many languages involved with PHP: PHP, Javascript, HTML, CSS, and xml. Templating systems are slower, too. The world has changed, though. Today, due to AJAX, you need to produce lists of data encoded into xml, or into fragments of HTML. You can easily do this with regular PHP... except that PHP can sometimes look sloppy, and leave you wanting a simple templating language. What's below is an extremely limited templating language, implemented in a single function. Additionally, there are two more functions that will apply the template to arrays and iterators. If you copy this code to a file, and run it on the server, it'll demo each function. More programming blather after the code. <?php # an extremely minimalist templating language # #$tpl = 'text{interpolate}text';
# $output = tpl_merge($tpl, array('interpolate'=>'text'));
# // $output is 'texttexttext'. echo tpl_merge'Hello, {name}.', array( 'name' => 'world' ) ); function tpl_merge($t$v) {$o $t;$find = array();

$repl = array(); foreach($v as $var=>$val)
{

$find[] = '{'.$var.'}';

$repl[] =$val;
}

$o str_replace$find$repl$o );
return
$o; } echo tpl_merge'<p>Hello, {name}.</p>', array( 'name'=>'{first} {last}', 'first'=>'Joe', 'last'=>'Blow', ) ); # a template merger that applies the template to an array of arrays. echo tpl_merge_array'<p>Hello, {name}.</p>' array( array( 'name'=>'John',), array( 'name'=>'Rosa',), ) ); function tpl_merge_array$t$a ) { foreach($a as $element)$o .= tpl_merge$t$element );
return
$o; } # A similar template merger that works with iterators. # An iterator is defined, minimally, as an object that has a next() method # that returns the next item, and null past the last element.$c = new Collection();
$c->add( array( 'name' => 'Gloria' ) );$c->add( array( 'name' => 'Steve' ) );
echo
tpl_merge_iterator'<p>Hello, {name}.</p>'$c ); class Collection { var$a;
function
Collection() { $this->= array(); } function add$a ) { $this->a[] =$a; }
function
reset() { reset($this->a); } function next() {$val current($this->a); if ( null===key($this->a))
return
false;

next($this->a); return$val;
}
}

function
tpl_merge_iterator$t$it )
{
while(
$a$it->next() )

$o .= tpl_merge$t$a ); return$o;
}

So, clearly, you can use these functions to build pages in a functional-language style. Just define templates and immediately apply them to iterators that wrap around queries. Producing html or xml from queries is simplified. Best of all (for me) you can write more code in a functional style than in the dreaded OO style.

echo str_merge_iterator( 'template{here}', query('select here from foobar where here>100') );

It's not really that terse, but, the idea is, you're not writing any more loops. All that is hidden.

# Stop Recording Bash History

Here's a script based on the information at http://www.cyberciti.biz/tips/shell-root-user-check-script.html.

It erases your history, and then tries to alter /etc/profile to stop recording history for everyone. Run it as a user and as root for the full effect.

AttachmentSize
fix_bash_history.444 bytes

# Strip Non-Numeric Characters from Data

This Javascript widget strips non-numeric characters from the input. The result will be a space-separated list of numbers. This is useful for extracting information from log files, dumps of data, and similar text.

Paste your data here:



# Telephone Number Normalizers: fix phone numbers into a common format

It's common to get a list of names and phone numbers in a spreadsheet or from the web, and the formatting varies. In the US, people don't use a standard formatting consistently. Lately, they have taken to making phone numbers look like domain names or ip addresses, example: 415.555.1212. This function normalizes phone numbers to look like this: 213-555-1212 x1234. The code's structured so multiple regexes are used to perform the matching, allowing for easier modification of the code. (This code was written in Excel, but should work in any VBA application.)
' Convert almost any phone-like string into a normalized form.
' The form is AAA-EEE-NNNN xPBXX
' This works only for US telephone numbers, but it's structured so
' it's not too hard to alter for other formats (or other idiosyncratic
' data entry persons).
' Requires Microsoft VBScript Regular Expressions 5.5
Function NormalTel(Phone As String, Optional areacode As String) As String
Dim parts(4) As String
Dim re As RegExp
Dim mat As MatchCollection
Dim phAreacode As String
Dim phExchange As String
Dim phNumber As String
Dim phExtension As String

Phone = RTrim(Phone)
Phone = Replace(Phone, Chr(160), " ") ' replace nbsp with regular space

' no areacodes
'123-4567
Set re = New RegExp
re.Pattern = "^(\d\d\d)[ .-](\d\d\d\d)[.,]*$" Set mat = re.Execute(Phone) If mat.Count > 0 Then If (areacode <> "") Then phAreacode = areacode Else phAreacode = "213" End If phExchange = mat(0).SubMatches(1) phNumber = mat(0).SubMatches(2) phExtension = "" End If '123-4567x12345 re.Pattern = "^(\d\d\d)[ .-]*(\d\d\d\d)\s*x(\d+)[.,]*$"
Set mat = re.Execute(Phone)
If mat.Count > 0 Then
If (areacode <> "") Then
phAreacode = areacode
Else
phAreacode = "213"
End If
phExchange = mat(0).SubMatches(1)
phNumber = mat(0).SubMatches(2)
phExtension = ""
End If

' no pbx extensions
'(123) 456-1234
re.Pattern = "^$$(\d\d\d)$$[ ]*(\d\d\d)[ .-](\d\d\d\d)[.,]*$" Set mat = re.Execute(Phone) If mat.Count > 0 Then phAreacode = mat(0).SubMatches(0) phExchange = mat(0).SubMatches(1) phNumber = mat(0).SubMatches(2) phExtension = "" End If '123-456-1234 re.Pattern = "^(\d\d\d)[.-](\d\d\d)[ .-](\d\d\d\d)[.,]*$"
Set mat = re.Execute(Phone)
If mat.Count > 0 Then
phAreacode = mat(0).SubMatches(0)
phExchange = mat(0).SubMatches(1)
phNumber = mat(0).SubMatches(2)
phExtension = ""
End If

' with pbx extensions
'(123) 123-1234 x1234
re.Pattern = "^$$(\d\d\d)$$[ ]*(\d\d\d)[ .-](\d\d\d\d)[, .]*(x|ext|ext.)[ ]*(\d+)$" re.IgnoreCase = True Set mat = re.Execute(Phone) If mat.Count > 0 Then phAreacode = mat(0).SubMatches(0) phExchange = mat(0).SubMatches(1) phNumber = mat(0).SubMatches(2) phExtension = mat(0).SubMatches(4) End If '123.234.2344x1234 re.Pattern = "^(\d\d\d)[ .-](\d\d\d)[ .-](\d\d\d\d)[, .]*(x|ext|ext.)[ ]*(\d+)$"
re.IgnoreCase = True
Set mat = re.Execute(Phone)
If mat.Count > 0 Then
phAreacode = mat(0).SubMatches(0)
phExchange = mat(0).SubMatches(1)
phNumber = mat(0).SubMatches(2)
phExtension = mat(0).SubMatches(4)
End If

If (phExtension <> "") Then
NormalTel = phAreacode & "-" & phExchange & "-" & phNumber & " x" & phExtension
Else
NormalTel = phAreacode & "-" & phExchange & "-" & phNumber
End If

' No number was detected, lose the dashes.  Copy input if the it didn't get detected.
If NormalTel = "--" Then
If (Phone <> "") Then
NormalTel = Phone
Else
NormalTel = ""
End If
End If
End Function


# The Value of Practice

Nothing beats practice. No amount of reading documentation and theory will teach as much as that same material combined with a system to play on. A good tutorial is even better.

I really undervalued this until recently, when I started to set up our new network. While theory was good, using the hardware sped up my learning by a magnitude. If I had to put a value on it, I'd say that it's worth around $1,500 of my time to buy something rather than merely study the documentation. (Of course, without the docs, it's pretty pointless - you learn to use the gear like someone who doesn't read the docs.) So I could do 3 or 4 nights with the docs, but at that point, I need the system. I'm learning LVM, and using a tutorial (or two) is teaching me a ton. LVM is kind of complex, because it's a couple layers of indirection between the logical volumes and the physical disk. With indirection, the tradeoff is usually between flexibility and complexity -- the more flexibility you get, the more complex it is to comprehend. The only way to get a handle on how complex a system is, is to use it. So, only by practicing on LVM, and trying different levels of complexity, can I get an hint of what is probably "too complex". That's why it's necessary to sit down and program rather than read about programming, and sit at a computer or virtual machine instance, and play around with the system. Maybe the granularity of the object model is too fine, or maybe it's not. Maybe applying many functions to the array is okay, once you really slow down and read it. Maybe RAID5 is good, and maybe it's not. # Turning California WARN PDFs into Text This was an odd project. Taking several PDFs of layoff data and turning them into text, so they might be used more like a database. This info should be offered up by the state as a database, but it's not (at least it wasn't to me). I ended up using a PDF to Text application to generate text files, then wrote these scripts to scrape the data out of the text. My goal was to dig up all the unionized workplaces. The WARN act is a law the requires employers to give 90 days notice of any coming mass layoffs. I don't recall the exact numbers, but, it applies to businesses that have a pretty large number of workers. These scripts are basically complete, but running them requires moving them into the right directories. Study the sources to figure this out. split.pl: (splits the text file into individual records) #! /usr/bin/perl open FH, ">/dev/null"; while() { if ($_ =~ m#([^ ]+.*[^ ]+?).+?(\d+?)\s+(\d+/\d+/\d+)\s#)
{
$comp =$1;
$count =$2;
$date =$3;

$comp =~ s/[^\d\w]+/-/g;$comp =~ s/[-]+$//;$date =~ s/[^\d]/-/g;
$name = "$comp.$count.$date.txt";
open FH, ">splits/$name"; print FH$_;
}
else
{
print FH $_; } } close FH;  parse.pl: (read each file and extract the interesting parts) #! /usr/bin/perl$line = <STDIN>; ## 1st line is the company, count, date, and part of the
## location.  split on ctl-U character.
$line =~ s/[\r\n]//g;$line =~ s/ $//g; chomp$line;
#print "**$line**\n";$company = ( $line =~ /(.+?)\s+?\d+/ )[0]; if ($company !~ /\cU\cU/)
{
$company =~ s/\s+$//g;
$line2 = <STDIN>;$line2 =~ s/\s\cU\cU\s*//g;
$company .= "$line2";
$company =~ s/\s+$//g;
}
$company =~ s/\s+\cU\cU//g; #print "**$company**\n";
($count,$date, $location) = ($line =~ m#$company[\s\cU]+?(\d+?)\s+?(\d+?/\d+?/\d+?)\s+?(\w.+?)$# );
#print "**$count**$date**$location**\n";$line = <STDIN>;
$line =~ s/[\r\n]//g; chomp$line;

($street,$location2) = ( $line =~ /(.+?)\s+?\cU\cU\s+([A-Z ]+?)$/ );
##print "**$street**$location2**\n";
if (! $location2) { ($street) = ( $line =~ /(.+?)\s+?\cU\cU/ ); ##print "**$street**\n";
}
else
{
$location =$location . ' ' . $location2; } #print "**$street**$location**\n";$line = <STDIN>;
$line =~ s/[\r\n]//g; chomp$line;

($city,$state, $zip) = ($line =~ /^([\w ]+?), (\w\w) ([\d-]+?)$/ ); #print "**$city**$state**$zip**\n";

if (! $zip) { ($city, $state,$zip, $extra) = ($line =~ /^([\w ]+?), (\w\w) ([\d-]+?)\s+(.+)$/ );$location .= ' '.$extra; } while($line = <STDIN>)
{
goto BAILOUT if ($line =~ /^Company Contact Name and Telephone Number/ );$line =~ s/[\r\n]//g;
chomp $line; #print "**$line**\n";
}

BAILOUT:
$line = <STDIN>;$line =~ s/[\r\n]//g;
chomp $line; ($cname, $layoff_or_closure) = ($line =~ /^(.+?)\s+?\cU\cU\s+?Layoff or Closure:  (\w+?)$/ ); #print "**$cname**$layoff_or_closure**\n";$company_contact = <STDIN>;
$company_contact =~ s/[\r\n]//g; chomp$company_contact;
#print "**$company_contact**\n"; while( ($line = <STDIN>) !~ /^Union Representation/ )
{
## accumulate contact info here
};
$line =~ s/[\r\n]//g; chomp$line;
#print "1**$line**\n"; while($line = <STDIN>)
{
goto CONT if ($line =~ /^Name and Address of Union/); } CONT:$union_contact = "";
while($line = <STDIN>) { goto CONT2 if$line =~ /^Job Title/ ;

$line="" if ($line =~ /Name and Address of Union Representing Employees/);
$line =~ s/[\r\n]//g; chomp$line;
$union_contact = "$union_contact\r\n$line" if ($union_contact ne "");
$union_contact =$line if ($union_contact eq ""); }; #print "**$parts**\n";
CONT2:

print "\"$company\",$layoff_or_closure,$count,\"$date\",\"$location\",\"$street\",$city,$state,$zip,\"$cname\",$company_contact,\"$union_contact\"\r\n";


make.sh

#! /bin/bash

for i in *.txt ; do
echo $i ./parse.pl <$i >> report.csv
done

AttachmentSize
layoff.jpg30.83 KB

# UPS Mishap

Woe to the sysadmin who trusts their UPS to work as expected.  It turns out that some UPSs won't warn you when the battery is dead or low.  You find out it's not functioning when you unplug it.

So, as a matter of course, it's necessary to test UPSs.  It's not easy.  You have to schedule system downtime, shut down the computer.  Then unplug the UPS and plug in a device, and test how long it stays up, and test if the UPS beeps.  If the battery is dead, then it's time to buy a new set of batteries.

# Ubuntu KVM Switching Problem, and Fix

KVM switchers read the ScrLk LED, switching computers when the see the LED toggle. Normally, you toggle it by pressing Scroll Lock twice. Ubuntu doesn't accept ScrlLock, and doesn't turn the LED on. Not finding a way to enable it, I opted to use the suggestion in the linked article, and created a KVM switching script.

The script here creates a new command, switchkvm.

echo "xset led on; sleep .25; xset led off" > switchkvm
chmod a+x switchkvm

I put an icon in my toolbar so it's one click away. Attached in an ugly icon for it.

AttachmentSize
swindows.png551 bytes

# Ubuntu Linux PS/2 Mouse Stopped Working

After upgrading to a new kernel my USB keyboard stopped working. Arrgh, not again. I plugged in my spare PS/2 keyboard and started troubleshooting. The problem, it turned out, was that a version of the Ubuntu server kernel was installed, and that didn't boot up with USB. This wasn't the first time this happened, so I deleted those kernels and ran grub-makeconfig to create a new grub.cfg.

Rebooting brought back the USB keyboard, but killed the PS/2 mouse.

I tried a couple suggested fixes. First was to use the "i8042.nopnp" option. You do this by adding "i8042.nopnp=1" to the kernel line in grub.cfg. That didn't work.

Second was to add "psmouse" to the /etc/modules file, so the PS/2 mouse driver gets loaded at boot time. This didn't work either.

The problem turned out to be what PS/2 device was detected. The kernel always found the "KBD" device. The motherboard had a single PS/2 port which could support either the keyboard or the mouse.

The solution was to power the computer off, then on again. The port reconfigured itself to support a mouse.

What happened: the port self-configured to a keyboard when I was fixing the keyboard issue. I swapped in a mouse while the computer was powered up, and it never got reconfigured, even though I rebooted several times.

# VBA: Transforming XML Error Messages into VBA Errors (Raising or Throwing Errors)

This is trial code that I used to translate an error from a Yahoo web service into a COM ErrObject.

It's not real XML parsing, but good enough for this purpose. IF an error message is sent, we extract the message and then use Err.Raise to throw an error.

Sub testRegex()
Dim response As String
response = "<?xml version=""1.0"" encoding=""UTF-8""?>:+" & vbCrLf & _
"<Error xmlns=""urn:yahoo:api"">" & vbCrLf & _
"   The following errors were detected:" & vbCrLf & _
"        <Message>unable to parse location</Message>" & vbCrLf & _
"</Error>" & vbCrLf & _
"<!-- ws01.ydn.gq1.yahoo.com uncompressed/chunked Tue Aug 11 15:44:44 PDT 2009 -->"
e = RegExMatch(response, "<Error xmlns=""urn:yahoo:api"">\s*.*\s*.*<Message>(.+)</Message>\s*</Error>")
Debug.Print e
If (e <> "") Then
Err.Raise 123, , e
End If
End Sub


Note that we don't create an instance of ErrObject (we don't do a "Dim e as ErrObject"). You can't instantiate one. There's only a single Err object in the environment, and you reuse it. That's why Err.Raise takes arguments, instead of allowing you to change the value of an Err.

The definition of RegExMatch is:

' Returns the first regular expression match object of comparing regular express test to source
Function RegExMatch(ByRef Source As String, _
ByRef test As String) As String
Dim regex As Object
Set regex = CreateObject("vbscript.regexp")

Dim match As Object

With regex
.Pattern = test
.Global = True
.MultiLine = True
End With

Set match = regex.Execute(Source)
If match.Count > 0 Then
If match(0).SubMatches.Count > 0 Then
RegExMatch = match(0).SubMatches(0)
Else
RegExMatch = ""
End If
Else
RegExMatch = ""
End If
End Function


Now you can use exception handling to deal with errors from the web service.

In this application, we really just want to mark the error and continue encoding more data.

Exception handling is nice because the function calls are nested a few levels deep. The looping is done up at a layer where we do a lot of SQL. The network communication is done within a network communication method, and there's one class in-between. You want the error on the network side to affect the behavior of the loop up in the SQL-calling layer.

With exceptions, each layer just needs a little code to catch the error and re-throw it up to the caller. Eventually it will be caught by a a caller that will log the error, and continue processing.

One of these days, I'll prep a nice example.

# Vi and Vim, Macros

Vi and Vim have a "macro" feature to help automate routine editing tasks.

Sometime, you get a document, a file, or some data that's just messed up looking, or was formatted for printing, and you need to reformat it.

Most editors have some kind of macro function, where repetitive tasks can be automated. Unfortunately, these macros have, over the years, acquired an acute case of featureitis. Vim keeps it simple.

Unlike MS Word's Visual Basic, or even Excel's macros, which require some programming, vim's macros simply play back keystrokes. (There is a built-in programming language for vim, but that's a different feature.)

To record a macro, press q, then press a key to name the macro. The macro is named by a single character - you can't have a long name. You can use the letter keys and number keys. (If you use the number "2", it's easy to run the macro.)

To run the macro, press @ and then the name of the macro. If you selected "2", you can keep the shift key pressed down and type "@" again, and it'll still run macro 2. (My personal preference is to use the "q" key.)

## Strategy

The typical use of a macro is to reformat data. For example, I got some tabular data in a word processor. I couldn't easily extract the data, at first, but eventually figured out that it could be copied to the NVU HTML editor, which has a slightly better "copy and paste" function.

I pasted the tabular data into Vim, and got this:

1

Joe Blow

2

Harry Carey

3

Mary Christmas

4

Ann R. Key


I wanted one number and name per line. The macro was "JJJ[Enter]". It joins the number and name, then the third join closes up the gap below the line. The enter key moves the cursor down to the next line.

To execute this, I run the macro over and over, until I get"

1 Joe Blow
2 Harry Carey
3 Mary Christmas
4 Ann R. Key


## Repetition

A macro can run another macro. That allows you to run a macro over and over.

If you're writing a macro named "a", and you type "@a" at the end of the macro, the macro will call itself. This will cause the macro to be run over and over!

To stop the "infinite loop", press Control-C.

Typically, I don't bother with loops. Instead, I create a second macro that calls the first macro several times. The macro might be "@q@q@q@q@q" to run the "q" macro five times. Then, run that second macro several times, by hand. Often, there are small variations in the data that will cause the macro not to work perfectly, and I have to fix it up by hand.

If there's less than a hundred lines, it's easier to just type the 200 or so keystrokes to get the job done, rather than get complex.

## Theory

One reason why vi/vim macros are more powerful (and popular) than macros in other editors, is because vi/vim has modal editing. When you're moving around the file, you are typically moving through lines and words, not characters. Joining and splitting lines, deleting words and lines, and moving to the front and end of the line, and of the file, are single keystrokes.

This is a slightly higher level of abstraction than most editors, and vi/vim forces you to use these abstractions. When coupled with macro recording, the macros are that much more powerful, because you can move around the file more precisely.

# VirtualBox OSE: can't find kernel driver, run modprobe vboxdrv

I got a message to run modprobe vboxdrv, but didn't seem to have the vboxdrv driver.

It turned out that the vboxdrv.ko object existed (turned up by doing a "locate vboxdrv"), but not for my current kernel. The solution was to rebuild the driver for my kernel. To find out what kernel I had:

uname -r

If you don't have the vbox drivers, install them:

sudo apt-get install virtualbox-ose-dkms

The vbox drivers are build using DKMS, which is a driver framework. (It allows drivers to be built apart from the kernel, so the drivers are standalone, somewhat like they are with Windows.) Normally, the apt-get program will rebuild the drivers automatically, but if the kernel headers are not installed, they will not get rebuilt.

ls /usr/src

If you don't see a directory corresponding to the kernel version number, you need to install the linux-headers package from apt-get.

sudo apt-get install kernel-header

Run that, and you might get a list of potential headers packages to install. Choose the one that matches your kernel version, and install it. Example:

sudo apt-get install linux-headers-2.6.32-24-server

Installing a new kernel should trigger a rebuild of all the dkms-based drivers.

sudo apt-get --reinstall install linux-headers-2.6.32-24-server

Then, start up the virtualbox-ose service (which loads up the drivers):

sudo /etc/init.d/virtualbox-ose start

# What Is the Difference between Access and Excel?

There's probably a frustrated IT or database person telling someone that they shouldn't be using Excel, that the data should be in Access or a database. The Excel user on the receiving end is probably wondering what Access is.

They seem similar. They both store data as rows and columns, but it's the differences that make a difference.

## Excel lets non-programmers manage data in flexible ways.

That's why people like Excel. You put your lists in there, and it's easy to add different kinds of categorization. You can use highlights, colors, boldface, italics, and different font sizes. Some people use elaborate indentation. It's pretty awesome... for human beings.

For a computer, that's all "mess". While people can tell each other "the red background means so-and-so owes money, so make them pay up first," getting a computer to deal with that is tougher.

Someone who knows how to program the spreadsheet - to do math, or comparisons, or use filters, or make crosstabs - they'll tell you to add a column, and put a 1 or 0 in there to indicate that. They can then use a filter to produce a list of people who owe money. (At this point, this person might suggest using Access. That's usually ignored.)

Incidentally, the spreadsheet was invented in the 70s to do the math, not to store data. People just started using them to store data, and when the Mac and Windows versions came out in the 80s with fonts and styling, people started using the styling to organize the data, too. That's how people are. We do things the wrong way, and if it looks nice, we think it's cool. Yeah, we're stupid, or something.

## Access lets programmers or semi-programmers manage data in flexible ways.

That clever person who could do the filters took the spreadsheet one step towards being used like a database. Access is a database system.

The main problem with using Excel to store data is that it's difficult to store large amounts of data, and manipulate it. The bigger the list becomes, the tougher the task becomes. Managing 100 rows is easy. Managing 1,000 rows is tougher. Some people end up making complex macros to perform the manipulations.

That's where Access shines - when you have a lot of data. Like in the example about the red highlight above, you can't resort to styling tricks; you have to make new columns and put values in them.

Instead of filters or macros and direct manipulation of the data, you perform "queries". A query is a database's way to extract a subset of rows from the database.

Access has a query designer.

Here's an example of a query:

SELECT name, address FROM customers WHERE debt=1

That selects a subset of customers where their "debt" value is 1, which means "true" in our system. That's not too hard, it is?

## Access separates data entry from data reporting.

In a spreadsheet, the data you enter is the data you see, sort, and filter.

In a database, the data entry is separated out. Database systems typically have "forms" for data entry, and "reports" or "report writers" to print or export data.

We're all familiar with databases because we use the web. Some websites have fill-in forms. You submit them, and, generally, you get some information back, like an order number or an email. That's analagous to forms, databases, and reports: the fill-in form saved data into the database, the page with the order number or email you receive is a report. The report is mostly a big template, and your data is this tiny thing, but it's still a report.

Access comes with a form designer, and a report designer.

## It's a system

With Access, you have queries, forms, and reports. You also have a programming tool similar to macros. Each of these things is saved within a database file.

What's nice about having these things saved in a database file is organizational. Each of the things you want to do has a place to be filed, and a name. It's not all stuck in your head, like when you use Excel.

That "red highlight" example above could be saved out as a query, maybe named "qry People Who Owe Money". So someone who knows Access can read the query and run it.

In fact, you can still have a red highlight. You can do it in the reports. You can create a report shows all the people, and puts a red highlight behind all the rows of people who owe money. The report can use a query that calculates the debt as the data source.

This description of what Access can do only scratches the surface.

## The problem

The main reason why people don't switch to Access is because it takes away a lot of things, like fonts, boldface, colors, and all the tools we use. What it offers is a more spartan environment, with only grids and text. So, people start using Access and think, "this sucks."

But it doesn't suck; you just haven't gotten to the good part yet. It gets good when the amount of data increases.

Excel is fine for a few hundred rows of data. Access is considered small for a database system: it's good for up to tens of thousands of rows. Database servers are typically used to store millions or even billions of rows, and hundreds of columns, across thousands of tables. It's vast.

# What is HTML 5?

HTML 5 is a marketing term (kind of like "cloud computing") that has a somewhat imprecise technical meaning, but was created so that products and people could easily sum up their compatibility or knowledge and skills.

For example, Firefox 13 is HTML 5 compliant, and this website is HTML 5 compatible, and I know how to write applications using HTML 5 features.

HTML 5 roughly corresponds to the baseline web-browser experience on a new PC in early 2012.

HTML 5 is three browser-based technologies which can be used, together, to create web pages, web site, and web applications that begin to rival what could be done with Flash and desktop computer applications in the recent past (around 2003 or so). Yup - you can now do in a browser what you could do, in other ways, nearly a decade ago.

The main difference is, with HTML 5, you can deliver this experience over the internet. And it doesn't require installing any software, which is hugely important.

The three technologies are: HTML, CSS, and JavaScript.*

HTML is a way to create pages which bring together text, images, and video. CSS, combined with HTML, allows designers to change the appearance of that data. JavaScript is a programming language that is used to manipulate CSS and HTML to create pages that respond like desktop applications. JavaScript also controls much of the "behind the scenes" technology to store and retrieve data on the computer, and across the internet.

HTML 5 refers to the totality of these technologies, at specific versions, with specific features, and a roadmap for future features. It's a moving target, but one which all the major browsers are aiming to reach.

Lastly, there is one final piece of the puzzle, but it's not part of "HTML 5" - that's the web application server. The app server provides the shared experience of the internet, so many people can go onto one site and use it, together.

* Note that all these technologies have existed since the mid 1990s.

# CSS Hints for Technoids Who Forgot to Learn CSS

The original was written: 2004-11-18 03:16:46 -0700.

Here's a bit of the article:

Dang, but it took me forever to learn CSS. Maybe I should have used a book. Here, I'm going to share with you the hard-found knowledge, presented using technical programmer jargon. (Revised in 2014.)

What is Cascading Style Sheets (CSS)? The typical answer is that it's a way to separate the way a page looks from the the underlying HTML, which describes the structure of the document.

What is HTML? It's a markup language used to add a hierarchical structure and formatting codes to text. The HTML and CSS are interpreted by a web browser, to display a web page.

By itself, HTML is sufficient to do formatting that's adequate for term papers, short books, instruction manuals, and other basic documents (like this document). However, it's insufficient for doing graphic design for web pages. That is what CSS is for: precise formatting of structured text.

## HTML as Code

To understand CSS, you need to understand HTML. HTML has two characteristics that programmers will understand. First, it is object oriented. Second, it's hierarchical. The stream of text is treated as a hierarchy of objects, which contain text, and also contain other objects.

Here's the simplest HTML document:

<!doctype html>
<html>
</html>

The first thing is called a doctype entity, and it's like a header line that identifies the document. The second part is the HTML tag. There's an opening tag, , and a closing tag .
Here's a more conventional HTML document:

<!doctype html>
<html>
<title>sample document</title>
<body>
</body>
</html>

Within the HTML tags are pairs of tags for HEAD and BODY, and within HEAD, there's TITLE. As you can see, it's a hierarchy of objects delimited by tags. The code, when interpreted by a web browser like Firefox or Internet Explorer, is converted into objects. The tree of objects is called the Document Object Model or DOM.
When people think of tags, they often think “markup”, but don't yet think “object”. Start thinking of them as objects that contain what's between the opening and closing tags.
Here's a final example, and the one we'll use to describe and explore CSS:

<!doctype html>
<html>
<title>Hello, world.</title>
<style type=”text/css”>
body { font-family: Arial; }
</style>
<body>
<p class=”latintext”>Lorem ipsum...</p>
<p>This is regular text</p>
</body>
</html>

The HEAD object typically contains resources and information about the document, but not any text of the document. The STYLE tags delimit a block of CSS code, which the browser will use to style the page. The code within the STYLE tags is not displayed.

The BODY tag now contains three things. H1 is a heading tag. It has an attribute “id” which has the value “headline”. Attribtues are like object properties. The ID attribute is used by CSS to identify tags, and must be unique within a document.
The P tag delimits a paragraph. The default formatting for P is flush left, with a margin above and below. The CLASS attribute is similar to the ID attribute, but more than one tag can have the same value for CLASS. So multiple paragraphs could have the “latintext” class.
Now let's get into the CSS code. Here's the code again:

body { font-family: Arial; }


CSS is a simple language with very little syntax. That one line is called a RULE.

The part on the left, “body”, is called the SELECTOR. The part on the right, in braces, { font-family: Arial }, is called the STYLE DEFINITION. The parts inside the braces are called STYLE ATTRIBUTES and VALUES.

What that rules says, is, for all tags that match BODY, set the font to Arial. Only one tag matches, and it's the whole document.

CSS is a DECLARATIVE LANGUAGE. The programmer declares how the document should look, and the browser figures out how to find the objects that match the selectors, and then apply the style definitions to those objects.
A CSS program is a stream of rules. For any given HTML object, all the styles that apply to the object are combined, with the later rules further down in the stylesheet overriding the earlier rules.
For example, we could add some more rules:

  #headline { font-family: “Arial Black”; }
.latintext { font-family: Times; font-style: italic; }


Aha, we see a couple different selectors. The first is a selector by ID:

  #headline matches only the object with id=”headline”.


Next is selector by CLASS:

  .latintext matches any element with class=”latintext”.


The style definitions are pretty self explanatory, but the selectors require a bit of explanation and example. They are extremely important, though, because understanding selectors will help you to use CSS properly. Without this understanding, you will make some mistakes that might cause problems in future iterations of your website.

CSS Selectors are a kind of querying language. The query is run against a hierarchical database of objects, the HTML DOM.
The least specific selector is the tag. After that is the class, which can apply to more than one tag. There are several more levels of specificity, which I'll discuss in a moment. Then, way over at the other end of specificity, is the ID.
There are a few other ways to query the DOM.


body h1 { …. }  Applies to situations where H1 is within a BODY.
body>h1 { …. } Applies only to situations where body is the parent of h1.
p.latintext { …. }  Applies only to P tags with the class=”latintext” attribute.


# What is a Server?

I've been asked this simple question, and given the simple answer: it's a PC that's on all the time, running services for others. Well, that's right, technically, but it's also the wrong answer to tell everyone.

This post is inspired by this video: A good video about servers by Eli the computer guy. (His videos are good. Kind of long and repetitive, but basically right on the money.)

The first followup question I get is usually "it's not a special computer?" Well, um, yeah it's special, but it's basically a regular computer.

A good analogy is the difference between utility vehicles. There are mini trucks, there are 1 ton and 2 ton trucks, and there are longbeds. They're all cars with beds, but they have different capacities, different sizes, and so on, but are basically the same technologically. The difference is that if you load up the mini truck with a bunch of bricks, you're going to damage the suspension.

A server computer generally has better performance, particularly when you have multiple people trying to access files. They generally have redundant hard disks, in removable trays, redundant memory that is able to correct errors, redundant CPUs that can fail, and redundant ethernet connections so you can have one burn out. That said, they are generally slower at some other things, like graphics. Usually, the graphics suck. You usually don't have as many USB ports. The fans are sometimes as loud as a small vacuum cleaner. So, as a PC, they have some real negatives.

Server software generally installs "lean" - they features aren't turned on. Years ago, they used to have many features turned on, but all the sysadmins wasted hours removing all the stuff they didn't want. So the new style is to deliver the software with everything turned off.

The latest Windows Server, 2012, can even be installed without graphics. Likewise, if you install a Linux server, you may or may not get graphics. You're going to get almost nothing. You have to install that stuff later.

The weird thing, of course, is you get less, and have to pay $2400 for the software. LOLZ! The same goes for the hardware. It's expensive. Is it worth it? Well, that depends. You usually don't have a choice. The server will be more reliable. If you want full redundancy, you have to get the expensive software, and set it up. The only exception is if you're building on Unix and can build a redundant network of cheap computers. It's doable, but it's also a lot of work. Mathematically, if you spend$10,000 on a couple good servers, your redundancy (and performance) isn't going to be as good as a network of 10 $1,000 computers. The network will be around 8 or 9 times faster, and the reliability will be, I'd guess, several magnitudes better. Think about it - what's the odds that a crap computer will expire this year? Pretty high. But what are the odds that 5 will expire? Probably close to nil. Not only that, but the time to replace that computer is around a day. Just go to the store and buy another computer and rig it to replace the failed box. When was the last time you lost half your LAN within a couple days or a week? Yeah, I've never really heard of that happening before either. On the other hand, an expensive server might be less likely to fail, but there's still a chance. When it fails, you're left with one other computer. Again, odds are, it's safe. But your risk exposure is going to be a day or two to get a rental server, and a week to a month until you can get the broken server replaced. The only way to eliminate that exposure is to buy a third computer. The trade off is that it's much easier to administer two servers than it is to manage a network of 10 machines. It uses less space, less electricity, and the overall setup is just less complex. So for a small business, the simpler solution is the right one, even if it's riskier and really a bit more expensive. For a data center, the cheap path is the way to go. You not only have less risk, you're also going to develop the technology to grow the network via this redundancy. The Amazon EC2 model is to run a bunch of cheap computers, and then have them basically act like a smaller network of extremely reliable computers. They charge double what web hosts charge, but they scale up. So at some point, when you outgrow a web host, and are faced with buying an expensive server, the EC2 system ends up cheaper. Since the system runs on cheap hardware, Amazon makes money off the margin between the cost of a cheap redundant network, and an expensive, less redundant server. (Note: Amazon probably buys custom-made computers that are even cheaper than regular PCs. They're probably motherboards with a couple CPUs, no graphics, no disk IO and no power supply.) ## Video Comment ECC registered RAM is a memory with a couple parity bits, similar to RAID-5. It's worth it. I had a server that experienced around 5 RAM errors per year. ECC is not just memory, but infrastructure that tells you that RAM is failing or flaky. # What is my IP? # WiFi: Improving Reception with a Chip Bag The best ways to improve WiFi reception is through antenna positioning and using reflectors to guild the signal. I have a PCI WiFI card that's positioned in a terrible location. It's in the bottom slot, and is surrounded by coiled up power cords, computer cables, and other crap. So, my signal is weak. I get "2 bars". On a lark, I took a big potato chip bag (Lay's) and cut it into a big rectangle. Then washed the oil off, and taped it to the wall behind the WiFi router's antenna. I got an immediate one-bar improvement in signal. It wavers between two and three bars now. The bag has a thin layer of foil, like most chip bags nowadays. That foil works as a reflector, bouncing the signal in my general direction, and also reducing interference from the other side. There's no curve in this version of the reflector. A curve would tend to focus the signal, and thus narrow it's coverage. I didn't really want that because the signal's used throughout the house. Eventually, I'll glue this foil onto cardboard, and add a very slight curve to it to direct the signal a little bit more toward me. Mounting it on cardboard would remove some of the crinkles, which are probably distorting the signal. Also, using old soda cans, I'll set up some parabolic reflectors for my wifi card's antenna, and also figure out how to deal with the wire and noise problems. The real solution is probably to get an antenna that can be positioned higher up. AttachmentSize wifireflector.jpg14.57 KB # Windows Backup, Backup Exec, and System State Recovery I was using Backup Exec to maintain several backups of Microsoft Windows Server 2003. The backups were kept on a different server (a small NAS box). *** Impatience: One time, however, the system had a very hard time starting up, and the user interface eventually slowed to a crawl. It was impossible to interrogate the system, much less do any work on it. (It could have been malware - I'm still going to have to try and fix this box.) Out of impatience, I rebooted the machine by holding down the power switch. (Never do this again. Wait it out as long as possible.) This corrupted the Active Directory database. Subsequent boot failed, and the OS was telling me to boot into Active Directory Restore Mode. So I did. A search of help found some instructions on testing the AD databases. The tests failed. Repair also failed. (The weird thing was, the JET db engine seemed to be failing.) A web search said to perform a restore from backups instead. AD is normally backed up as part of the "System State". In Backup Exec, System State is backed up as part of the Shadow Copy Components. Shadow Copy is an OS feature that snapshots a file, so it can be backed up -- this is necessary for backing up the numerous files that make up the operating system. *** Can't start up Backup Exec: Unfortunately, by booting into AD Restore mode, the domain controller (which wouldn't work anyway) was down. The Backup Exec services were configured to log on as a member of the domain. So BE wasn't starting up. Thus, a restoration was not possible. A quick read of the BE help found an article about how to restore system state when in AD Restore mode. You have to modify the services to start up as the Administrator. Once started, you have to perform a restoration of system state, making sure that the credentials are set to Administrator. *** Can't get to backup files: Also, the NAS had been set up to use AD as well, so it was inaccessible. There was a non-AD username and password on there, so I re-logged-in as that user, and everything was okay again. A System State restoration was performed. The data was checked using NTDSUTIL, and it was okay. So a reboot was performed, and the system came up fairly quickly and without incident. *** It took 1.5 hours: The main problem was, it took 1.5 hours to perform the complete restore. This was due, partly, to reading instructions, and partly to the time required to start up the server. A better backup configuration could have kept the downtime to around 30 minutes. This is described below. Solution - use Backup to save System State, as well as Backup Exec. Backup is the built-in backup software. Unlike Backup Exec, it will run without AD. It can be scheduled to create a system state backup once a day, to a local disk. Once a week, it could create another, slightly longer-term system state backup. This way, we can quickly restore to yesterday's state after a single reboot. (This would take 10-15 minutes.) If that fails, we can try the weekly backup (another 10-15 minutes). If that fails, rely on the backups on the remote server or tapes. # Windows Remote Desktop on Windows 7, Speeding Up Slow Performance If Windows 7 seems much slower, try changing the theme. Click the Start Menu, then type "theme". The option to change the theme should appear. Click on it. Set the theme to "Classic" (it's near the bottom of the list). This removes all the gradients, leaving the system looking like Windows 2000. Without the gradients, everything will be faster. # Windows Small Business Server 2003: disabling sbcrexe.exe This is an awesome tutorial on how to kill this annoying process that forces the owner to run SBS as a domain controller. It's also a great howto about permissions in Windows. We have to do this because Microsoft Marketing decided that every copy of SBS should run as a Domain Controller. If you happen to have two licenses of SBS, and want to turn one SBS into a plain-old-server for file-serving purposes, or some other lesser use that would benefit from a leaner OS setup, you cannot. SBS forces you to do the "domain" thing, or you can go purchase another license for the regular Win2k3 Server. This is why I prefer to deal with Linux. Less marketing scheme BS. Almost everything is licensed free, per seat, per cpu, or per machine. http://forums.speedguide.net/showthread.php?t=173731 Here's the text, by "Blarghie", copied here: Pafts original post drew me to this thread after a google search. I also didn't want to have to bother with this crap that my legitimate copy of Windows SBS couldn't run unless it was a DC. As it happens, we already had a second licence of SBS and simple wanted to re-use a currently un-used licence of SBS to implement a webserver, but without all the bloat that the SBS install affords. The first thing I did was to install the server normally, the first chance you get to cancel the install of SBS bloat is when Windows starts for the first time after install, I seized my opportunity. What I didn't see however was the quite frankly ridiculous scenario whereby Microsoft had decided to force restart the server every hour and NET SEND spam the network "this server doesn't comply with licensing requirements" across the entire network. Microsoft can stick that. Anyway, like I said it was Pafts post that brought me here to the forum, and I've found a slightly more elegant solution to this problem rather than just aggressively killing the process until Windows gives up trying to start it again, and I'd like to share it in the hope that Google will re-index and pick it up for others to use. You may have noticed this service cannot be disabled via the MMC snap-in. My search term on google was: how to stop the SBCore service Anyway, down to business… - Tools you'll need – Process Explorer from www.sysInternals.com As you probably know, you have a service called SBCore or "SBS Core Services", which executes the following process: C:\WINDOWS\system32\sbscrexe.exe If you kill it, it just restarts – and if you try and stop it you are told Access Denied. If you fire up Process Explorer, you can select the process and Suspend it, now we can start to disable the thing. Run RegEdit32.exe and expand the nodes until you reach the following hive / key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SBCore Right click this, hit permissions and give the "Administrators" group on the local machine full access ( don't forget to replace permissions on child nodes ). F5 in regedit and you'll see all of the values and data under this key. Select the "Start" DWORD and change it from 2 to 4 – this basically sets the service to the "Disabled" state as far as the MMC services snap-in (and windows for that matter) is concerned. Next, adjust the permissions on the file C:\WINDOWS\system32\sbscrexe.exe so that EVERYONE account is denied any sort of access to this file. Then go back to process explorer, and kill the sbscrexe.exe process, if it doesn't restart – congratulations! Load up the services MMC snap-in and you should find that "SBS Core Services" is stopped and marked as Disabled. Regards, # Windows XP Boot USB 1. A USB drive. I ended up with a SanDisk OEM'd one from Staples. 2. Ultimate Boot CD for Windows is good because it has a lot of tools. There are others, but many seem to be based on Bart PE. 3. PE to USB, takes Bart PE output and writes a bootable USB drive. 4. If it BSODs, HERE is a fix. # Windows XP, Installation Acrobatices with Product Keys I've wasted around five hours this past week dealing with miscellaneous Windows XP licensing issues. Ever since Microsoft (basically) started tracking users, it's been difficult to resort to the tried-and-true technique of "justified piracy" to maintain one's legitimate software license rights. After all, the way 99% of the world sees it, if you paid for a licensed copy, but lost the CD, you're entitled to use the software legally. What happened was, my work didn't have a copy of the Windows XP installer CD. Instead, it had a "Volume License Key" CD, which is a slightly different installer, which accepts only VLKs, as I learned when I tried to install XP using the key on the sticker on the computer. Because they had the VLK CD, I proceeded to call MS to see if they had a VLK agreement. They wanted me to assent to what amounted to a software audit for the site's Windows licenses. That seemed a little dicey, because there were definitely duplicate keys in use, but each machine also had an genuine XP sticker on it with a unique key. The admins probably used the same key out of habit, when they upgraded boxes. It was also going to take days to process this bureaucratic mess. So, the internet came to the rescue. I found some VLK serial numbers on the internet, and installed from the CD. Then, I went in and tried to change the key. This invalidated the original key, and forced me to use Microsoft's tool to change the key. After supplying the original key on the sticker on the computer, everything went smoothly. For instructions on modifying the key: http://www.michaelstevenstech.com/xpfaq.html If you want to avoid all this hassle, use Linux. # Windows XP, Windows Server 2003 Miscellaneous Links How to Install DLLs with Regsrv32 # Windows: Drive Mapping Weirdness, Lost Data There are a few weird situations with Windows and drive mapping that should be noted. One situation points to bugs in Windows, and the other to some malware. ## Windows Weirdness If you are on a server, and a drive is mapped to a shared folder that's also on the local computer, you can lose data. I was updating an ADP installation, and the data was in D:\ADP (not really, just for explanation purposes). This was also shared as \\Accounting\ADP. That share was mapped as drive F:. Running the update on the data in drive F: didn't work. Additionally, attempting to update the files in D:\ADP seemed to cause windows to re-resolve it back to drive F:. So I had to unmap drive F: then perform the update. The data loss was recorded as some kind of Smb event. It was all very strange, especially because, even though the files were shared, they were on the local machine, so it's not like the data really went over the network. This indicates there's some kind of error in the file server software, or in the networking software. ## Malware Another time, on a client computer, the drive mappings to the file server were working, but access to the shares via UNC paths (i.e. \\fileserver\sharename) wasn't working. I never figured this out, but running the free F-Secure malware scan fixed it. My guess is that the name resolution was being intercepted, and mapped drives may connect to the server through a more primitive API that doesn't rely on name resolution. This is just a guess. # Windows: Installing Printer Drivers for 64bit Clients with a 32bit Server (Printer Driver Hell) The Windows server's printer server works like this: when you go to a server and double click on a printer icon, the system will download a printer driver from the server and install it into your local copy of Windows. The server has a small database of printer drivers, and each has a list of compatible operating systems, "bits" (word sizes and machine architectures). An x64 system will try to download the x64 driver. If it's not available, it will download the 32 bit drivers. The drivers are typically installed by running an installer on the server. This unpacks drivers, and then installs a printer. That printer is then shared. If you need to install drivers for different architectures, you can usually download the files and install them via the printer's properties. Unfortunately, the Xerox Global Printer Driver system does not offer only the driver files. All the files are bundled up in an executable. The 32 bit drivers are in a 32 bit executable, and the 64 bit drivers are bundled in a 64 bit executable. So if you have a 32 bit server, and a 64 bit client, you are in trouble, because you cannot run the 64 bit driver bundle on the server. What you need to do is unpack the bundle on the client, and then install those files on the server via the client. After that's done, you must re-install the printer. Here it is in detail. - First, as an administrator, install the Xerox Global Printer Driver, 32 bit, on the server in the regular way. Run the installer, do not install a printer. Go to the printer properties and specify a new driver. Point the installer to the driver files, and they'll be loaded up. - Log in as a domain administrator on the client. - Download the Xerox Global Printer Driver system. Focus on using the Postscript version. The PCL6 version seems to fail. - Double click the installer, and watch it unpack the files into a folder on the C: driver. Remember that folder. It will ask if you wish to install a printer. Decline that offer. - Go to the server and double-click on the icon for the printer on which you wish to install 64 bit drivers. The printer should auto-install itself. - Open the printer, and open it's properties. You should get an error alert because the driver is the wrong type. The properties should appear shortly. - Click on the Sharing tab, and click on the Additional Drivers... button. - Check off x64, click OK. - Browse to the drivers folder (you remembered this above), and click OK. - It will ask again for a GPD file. It's in that same folder, so specify it, and click OK. The drivers should start uploading from the client to the server. (That's how the files will get up there.) - Next, you must go to your Printers and Devices and delete the icon for that printer. - Go to the server, and double click that printer's icon. This time, the system will ask you to install drivers. What happened is, the first time you installed the printer, it installed the 32 bit drivers because the 64 bit drivers were not available. The second time, it found the 64 bit drivers and installed them. As a final step, test the installation. # Windows: changing the password for a network share If you have a network share mapped to a drive letter, and it stopped connecting because the password changed, it won't ask you to correct the stored password, or even delete it. To fix this, go to the User Accounts control panel (type it into the Start Menu's search). Click on "Manage User Accounts", and then the "Advanced" tab, and then then "Manage Passwords." All your stored credentials will be listed. You can delete or change these from here. Deletion is easier, because when you reconnect, it'll show the password dialog a few times until you get it right. # Xubuntu Process List Notebook Ever wonder what Xubuntu is running when you start up? Here's a hyperlinked document based on running "pstree -A". init-+-NetworkManager-+-dhclient | -{NetworkManager} |-Thunar---{Thunar} |-acpid |-atd |-avahi-daemon---avahi-daemon |-bluetoothd |-console-kit-dae---63*[{console-kit-dae}] |-cron |-cupsd |-2*[dbus-daemon] |-dbus-launch |-dd |-gam_server |-gconfd-2 |-gdm---gdm-+-Xorg | -sh-+-ssh-agent | -xfce4-session |-gnome-keyring-d |-gnome-power-man |-gnome-screensav |-gvfsd |-hald---hald-runner-+-hald-addon-acpi | |-hald-addon-cpuf | |-hald-addon-inpu | -hald-addon-stor |-klogd |-6*[login---bash] |-nm-applet |-nm-system-setti |-notification-da |-python |-syslogd |-system-tools-ba |-udevd |-update-notifier |-wpa_supplicant |-xfce-mcs-manage |-xfdesktop-+-firefox---5*[{firefox}] | |-orage | |-xfce4-terminal-+-bash---pstree | | -gnome-pty-helpe | -{xfdesktop} -xfwm4  # Your Computer Has Been Reinstalled System Name: Owner's Account: Administrator's Password: Your computer has been wiped clean and reinstalled. Your data was backed up as best as possible, and has been restored to your "My Documents" folder. An extra account, named Limited User has been created. This user lacks the permission to install software. For additional security against viruses, use the Limited User instead of the owner's account. The following have been installed: Norton Antivirus - which came with your computer. Firefox - this is a replacement for Internet Explorer, and tends to be a less popular target for virus attacks. Microsoft Office - this was on there before. A CD is provided with the following: Drivers for your computer - they were downloaded from the manufacturer's website. Several trials of anti-virus products, including Avira and ClamAV, which are free. Your Norton anti-virus expires in 90 days, and you will either need to start paying for it, or purchase another antivirus program like Avira, Kaspersky, Trend PC-Cillin, or another program. (There's a new ad gimmick at http://www.trialpay.com/ where you can get "free" antivirus software by buying products you don't need, and getting on all the junk-mail lists.) # dOWN wITH cAPS lOCK Sick of tYPING lIKE tHIS? Wish you could press Control-C without contorting yourself? If you're on Ubuntu, there's a feature to help you out: On KDE (on Gentoo at least), it's under Control Center -> Regions and Accessibility -> Keyboard Layout -> Xkb Options, in the list. AttachmentSize Screenshot-Keyboard Preferences.png42.15 KB # nohup - runs your programs after you log out The following command will run the script, and then keep running the script after you log out. nohup ./somescript.sh &  Miracle? No. Nohup just ignores the kill signal, preventing the script from getting the signal. Thus, the script won't exit when you log out. The interesting thing about the command is that it gives you an idea of how easy it is to write nonterminating programs. You have to do work (or let the library do work) to automatically exit the program when the user exists. # rss2txt: RSS Headlines Output as Text This is a script that takes an RSS URL as an argument, and emits the headlines. Potentially useful if you have a small text-reading device that doesn't handle HTML.  #! /usr/bin/perl use XML::RSS; use WWW::Curl::Easy; my$curl = WWW::Curl::Easy->new();
$curl->setopt(CURLOPT_HEADER,0); open DEVNULL,">/dev/null";$curl->setopt(CURLOPT_WRITEHEADER, DEVNULL );
$curl->setopt(CURLOPT_URL,$ARGV[0] );
my $response_body; open (my$fileb, ">", \$response_body);$curl->setopt(CURLOPT_WRITEDATA,$fileb); my$retcode = $curl->perform; my$rss = new XML::RSS;
$rss->parse($response_body);

foreach my $item (@{$rss->{'items'}})
{
print \$item->{'title'}."\n\n";
}

close DEVNULL;