How to Backup

How to Backup is a free online mini-book explaining basic ideas about how to backup your network, backup technologies, and backup strategies to keep your systems online, and your data available.

How to Backup is a simple read. It doesn't get too theoretical. It doesn't cover enterprise backup - it's for small businesses and home offices.

You think you know what a backup is, but, do you really?

What is backup?

A backup is a copy of your data.

A backup is an archived copy of your old data.

A backup is a system that can be used to deliver your data, if the primary system fails.

A backup is a system that keeps operating, transparently, even if part of the system fails. It's fault-tolerant.

A backup lets you recover from bad data, quickly.

A backup with frequent incremental backups lets you undo a huge run of bad data.

A backup is a part of a system that costs less than the entire system, that allows nearly all people to keep working in the event of an equipment or data failure.

Links to variations on this booklet

How to Backup the Network at Home

Archives and Archiving Files and Documents

Archiving is different from backups. Think about them separately.

An archive is an organizational strategy for data. It's a structure into which data can be stored in a way that makes it easy to retrieve the data in the future.

There are a few different ways to organize information. To use some computer terms: "tables", "time", and "hierarchy".

Tables refers to database tables, where data is organized into records and fields (or rows and columns). A record is a unit of data, like a row in a list. A field is information about the data, or the data itself, like the columns in a row. The useful property of a table is that every row has the same columns, so you can sort and group by columns.

NameAgeSex
John40M
Rosa36F

A hierarchy is like a filing system of folders.

Chronological organization is to organize information by time, so you can retrieve the data from a specific time period.

The computer's file system uses all three methods of organization. Each file has common fields, like the modification time, size, and usually a file extension.

The files are stored in a hierarchy, and people typically name the folders uniformly. This uniform naming breaks up the filename into fields, so it's easier to sort through the files.

For more info, see the file naming convention articles below.

The file system generally lacks the ability to add extra fields of data. For example, it would be useful to be able to attach major and minor version numbers to every file. While there are some ways to do this, there isn't a simple way that exposes itself through the user interface, easily.

Consequently, the folder hierarchy is usually used instead of extra fields. It's not a bad or good thing - it's just how we do it. For some examples of this, see the folder organizing articles below.

Good archiving can assist backups by breaking the file system into parts. For example, if the folders are organized by client, you can split up the backups by client. Then, you can direct archives for old clients onto specific media, which might be kept offline for offsite. With very little work, you can cut down the time required to backup adequately -- and that translates into a greater capacity for the entire backup system.

File naming convention with dates

The file naming convention I use starts the name with a date: YYMMDD-file-name.ext

If I'm making revisions, I add initials and revision numbers separated by a dot or a dash: YYMMDD-file-name-x.ext or YYMMDD-file-name.x.ext

Similar conventions are used for folder names.

Though the system adds modification times, I still put the date into my file name, because the system's time and date can be lost. If a file is emailed, the creation date can be lost. Putting the date in the filename helps retain this extra data.

Using the date in the filename also helps when with the naming. Typically, I'm working on things for other organizations or people (for money), so I can name a file with the date and the other party's name. As new files are created, I don't have to invent new file names.

If there are multiple projects, just add the project name. The date assures that there's no need to invent new names all the time.

File naming conventions for routing documents past multiple editors

In a typical office, several people have to read a document - the writers, the editors, a manager, the signatory to the document, and possibly some artists.

In many offices, this is carried out over email. The problem with this technique are multiple, but for the backup administrator, the main problem is that each mailed file consumes space in the mail server's file system. It wastes space and network resources.

It also fails to scale up past small documents. Imagine editing long documents this way - it's not realistic.

The standard solution is to have everyone work on a shared file system.

Some offices use a system of "folders" where a document is edited, and versions are moved from one folder to another -- each folder acting as a kind of inbox and workspace. The folders within a project may be named "source", "edit", "review", "signed". Specific people look at each folder, and work on the contents within.

Some offices use project names, but other use project numbers. Numbers may actually work better than names, because people are generally good at mapping numbers to names, but not as good going the other direction (think about how much easier it is to see a phone number and identify the caller than it is to remember a phone number). Not only that, but, numbers are more precise than words -- people won't mix up "9099" and "9080", but they may mix up "Ford" and "Ford Foundation" and thus create confusion.

Some offices alter the file name of a document as it's modified. For example, you start with a document named "2010-Tribe.doc". As it gets edited, the file accumulated editor initials: "2010-Tribe.a.doc" then "2010-Tribe.aj.doc", and so forth as each person reviews the work.

Because the name changes, the backup software that runs every night will save each revision of the file separately. Similarly, if you use a file syncing software, you can accumulate revisions onto your backup.

File naming conventions for websites

Websites are archives. A website that isn't an archive is one that displays a lot of "404 errors" - file not found.

Perhaps more than other kinds of archives, it's important to plan the archive out for accepting new files for a long period of time. That's because websites get links, or what some call "deep links", which are links on pages past the so-called "home page". (I think it's a stupid distinction - a home page is only for branding and frequent users, and there are few of the latter. Most traffic comes from links and search engines.)

When you rename or move files, you break all the links out there. That's the fabric of the web.

To avoid this problem, you have to break up your system into manageable chunks, and you have to do it from the start.

If you expect to upload new image files every day, you should plan to have a system that can handle 365 files per year, and 3,650 files per decade. A single folder might be sufficient for the first 365 files, but, things get unwieldy at 3,650 files if you have to look at the files and pick them. Even the network will slow down when you get a file listing.

The solution for that is to use dated folders.

If you expect to get few images per year, except during events, when you get hundreds of images. The obvious solution is to create one folder per event.

I like to prepend the year to the event, so you get names that sort by date, like 09picnic. If that's not precise enough: 090815-picnic.

Uppercase?

You can use upper and lower case, but at your peril. Windows and Mac are case insensitive, but Unix is case sensitive. That means in Unix, "Car.jpg" is different from "car.jpg", and both are different from "car.JPG".

On Windows and Mac, all three are the same file. The hazard is that you create the three files on Unix, and then copy them to a Windows or Mac, and end up with only one file (or an error).

The convention is to use all lowercase for naming files on Unix.

To avoid problems, rename your Windows files in lowercase if they are destined for the Web.

Separate HTML files from image files?

Most websites have all the images in an images directory, and the HTML files are in other directories, or are in the "root" of the server. (the topmost directory).

This is probably because a single HTML file tends to include more than one image. Thus, as the site grows, moving images into their own directory just makes sense - it's a quick fix to the problem of growth.

Suppose each page includes three images. Then, each new page causes four files to appear on the server. 100 pages later, there are 400 files.

By moving images into a directory, the 100 pages cause only 100 visible new files in the directory.

Backup Laptops with a Dock

If you have a laptop that you travel with, consider getting a dock for your office desk. If your laptop isn't dockable (because it's a "home" laptop computer), then, get a universal dock. A universal dock is a dock with a USB connector, and an internal USB hub. (You might call it a glorified USB hub.)

To the hub, attach a USB hard drive or USB flash drive. The USB flash is better, because it uses less power.

Get some "sync" software that synchronizes folders. Some will initiate a sync when the drive is connected. Windows users may use a tool like Allway Sync or FolderClone. Set it up to backup the My Documents folder and perhaps the Desktop as well.

Every time you dock and log in, the software should sync and backup your important documents to the USB flash drive.

See the article backup external hard disk or usb flash drive for more ideas.

Backup Tapes

Backup tapes are a popular backup medium, but recently has become more expensive than disk. It's cheaper to use hard disks for backup.

Backup tapes have some advantages. They are smaller than disks, so you can pack more into a box, and send it to an archival location. Usually, backup tapes are stored in a cool, dry room. They are more durable, in that a shock to the backup tape won't cause failure, whereas disks may have a head-crash.

There are many different types of backup tapes, ranging from the old 9mm, serpentine, the Travan, and the DAT. The main backup tapes out there are 4mm that are used with DAT drives.

Enterprises (meaning businesses with scale and money) still buy backup tapes.

Consumers (meaning everyone else) has moved on to disk-based backups. Backup tapes cost more per megabyte than disks, unbelievably.

I guess there are a lot of enterprises overpaying for their data.

Backup to CD-R and CD-RW

Backup to CD shares a lot of problems that backup to DVD has, with some interesting differences.

The main difference is that CDs ware 1/5th the size of DVDs, so you can't backup as much data. Consequently, the backup is "faster" because the data set is smaller.

So, CDs are basically not good for backing up your system, but, are a great way to make archival snapshots of your work-in-progress.

For example, if you wanted to retain 7 days of your past work, you can purchase 7 CD-RWs, and label them "Monday", "Tuesday", "Wednesday" and so forth. Put them in jewel cases, and then into a CD box.

Each day, either at the start or end of a day, run a backup of your work, and then store it. It won't take more than 10 minutes.

For your effort, you are rewarded with an archive of your most important data, at your fingertips.

Backup to DVD

Creating backups on DVD-R or DVD-RW allows you to store up to 4.8 gigabytes of data (or 2.4 if you use single layder DVDs).

The main advantages:

  • low cost
  • archival, by default
  • widely supported, and readers are common

The main disadvantages:

  • slow write speeds
  • limited capacity
  • data is easily damaged

If you are going to backup to DVD, get an SATA DVD burner.

Chances are, you're only going to do a data backup, so, make duplicates of all your installer CDs and DVDs first. Make a disc with all the downloaded installers, and all the serial codes.

Then, backup the data. You may need to partition your data on the disk, and set up different backup jobs, to spread the backup across multiple DVDs. Prepare to spend a lot of time waiting.

Another disadvantage is that you can't always choose backup software, because the burner may not work with generic DVD burning software.

That all said, a DVD is very light, and easy to mail. It's a great way to make a weekly backup of a large project that can be sent off-site "just in case". It also gives your client something solid in exchange for paying their weekly invoice.

Backup to External Hard Disk or USB Flash Drive

A simple, transparent way to backup a personal computer is with an external hard drive or USB flash drive.

You don't need special software to do this - just copy the files.

The real issue is getting your files organized so all your documents are saved to the disk in one simple motion. (See organizing your files.)

Also, if you're a data-completist, you'll want to save the settings files (the .dotfiles in unix, and the hidden Application Settings in Windows).

If you wish to automate the process, some of the best software to use is "sync" software that compares the copies to the originals, and updates the copies automatically. The program I use is Allway Sync. There are others as well, but I found the interface to Allway Sync easiest to comprehend.

External hard disks have two risks. One is that the power adapter may fail. Another is that, because the drive is in a mobile case, you can drop the disk and have a head crash.

USB flash disks are less prone to damage, but it's possible to put them in your pocket, forget about them, and toss your pants into the washing machine, destroying the device.

USB flash drives also tend to be fragile because they stick out of the USB port. If you want to install it permanently, get a cheap usb extension cord at the dollar store, and tape the disk to your case.

A good backup solution for someone who isn't computer savvy is sync software, a USB flash memory drive, and the aforementioned extension cord. Set it up for them, and tell them to store their documents in only one folder.

External Hard Drive Backup Tips

If you're going to use a large external hard drive, for archival or simple backup purposes, here are the pros and cons of different cases:

External "Toaster"-type adapters

These are square blocks with a slot on top that accepts a SATA hard drive, and connects to your computer through USB or e-SATA.

The pluses are convenience, cost, and speed.

The minuses are the risk of metals shorting out the drive electronics, and a lack of heat dissipation.

External case with fan

These are the best cases - until the fan fails. Then, it's not so great. My personal experience was that the fan failed after a year.

The pluses are the fan.

The minuses are the risk of the fan failing - potentially leading to a hot hard drive.

External case without a fan

These are the second best cases. The ads say that the case is designed to pull heat away from the hard drive. It works as advertised, but, the heat must the be removed from the case. So the entire case needs sufficient ventilation.

Pros: nothing to break.

Cons: you still need to figure out a way to remove heat.

Power supplies: a universal problem

Generally, for whatever reason, the power adapters I've used with these external hard drives have generally been junk. They'll last 1 to 2 years, and fail.

There's no simple solution out there, except to buy another adapter. Make sure the adapter is on a hard surface with good circulation.

(A possible solution is to hack a PC power supply and use that to power all your electronics.)

Backup to Floppy

Backing data up to floppy went out with the 1990s. Hardly any computers have floppy disk drives anymore.

That's why it's important to gather your floppy disk based backups and move the data onto a hard disk.

When you do this, you'll be shocked at how much data's been lost to floppy disk degredation.

You may need to clean your floppy disk drive heads if you are a smoker. If you can't find one of those cleaning disks, you can fake it by taking a floppy disk that you aren't going to need and pouring a little alcohol on both sides of the disk. Pop it in for three or four seconds while the disks slide across the surface, then pop it out. The risk here is that the diskette will shed material before the heads get cleaned - so use a new diskette.

Floppy disks are still useful for doing system installations or restorations, so you still see them on some back-office systems.

Cloud Computing Backups

With more and more work being done "in the cloud" with web-based applications that store data on a remote server, edited through a web browser (or specialized client application), you'll want to backup the remote data locally.

The way to do this is to export the data using a tool that automates the process. For example, Google Docs Backup.

One of the nice things about Application Service Providers is that they save you from installing and updating software. The big risk is that they'll upgrade and leave your older documents unusable.

Legacy data in traditional backup scenarios is managed in two ways: one is pickling, where an entire system and software stack is retained to read the data. Another way to manage legacy data is to convert it to newer, more useful formats, or to older generic formats.

Cloud computing leaves you only the latter option. So, applications like Google Docs Backup try to convert the data to something generic.

How to Backup Email

The simplest way to backup an IMAP email account, like a Gmail or Aol Mail account, is to use desktop client software.

Two popular services, Hotmail and Yahoo Mail, don't support IMAP, so, you're kind of "out of luck" with them.

The rest seem to support IMAP.

Two popular IMAP mail clients are Outlook Express (now called Mail), and Thunderbird (from Mozilla). Both are nice because they allow you to create local files, and also save the email in industry standard formats like .eml and .mbox.

You can also script Outlook Express to do some of your dirty work.

The typical backup solution is to create local folders -- folders that aren't stored on the server -- and copy the server's data into these local folders.

There are also IMAP sync tools that copy all the data from one IMAP account to another. These very in speed, and most aren't fase enough for frequent backups, but, they can be used to copy the data over.

If you don't have a second IMAP server (and you probably don't), consider using something like Debian to set up an internal mail server that's used only to hold backups.

How to Backup Google Docs

TBD

How to Backup MySQL on a Website

Usually, a web host will give you FTP access to a directory and a web interface. A button in the web interface will produce a .ZIP file with the database contents, and you can download it via FTP.

If you have shell access, and you run a Unix at home, and develop your own website, you can use this script. It dumps the remote database, and then loads it into a local copy of your database.

#! /bin/bash

echo
echo Dumping db to db.mysql.
echo Type your password.
ssh launion@webhost.net mysqldump -u uname '-p--remotepw---' database_name  > db.mysql
echo Loading
mysql -u root '-p----password----' database_name < db.mysql

It's really just two lines of typing, but having it scripted is nice.

How to Backup Websites

TBD

Database Backup

The correct way to backup a database is to use a "mirror" or "replica" of the database. That's a server that's running a duplicate of the database, and, perhaps also acting as a load-balancing server.

These two servers are connected by a network, and as requests come in to the main server, they are either performed on the main server, or passed on to the mirror. Update and delete operations are carried out on the main server, then executed on the mirror.

A typical scenario is to reserve a single IP address for hosting the service (on LAN). The main database has this address. It also has a second IP address for inter-database communications. The second datbase has only the inter-db communications IP address. If the main database fails, the second machine "takes over" the IP address.

This failover is combined with database replication. The free MySQL server has this feature, and is described in replication.

This provides maximum fault-tolerance.

A simpler backup that doesn't have the advantages of a mirror, is to dump the contents of a database to text files, and back those up. This is obviously a lot cheaper than purchasing a second database server.

If you opt for the latter method, make sure that you can build a database server and load the data quickly.

For archival purposes, you may want to make a database dump regularly, compress it, and have it backed up with the rest of the files.

If you're backing up a database on a website, see the article
How to Backup MySQL on a Website
.

AttachmentSize
database.gif13.02 KB

Failure

Computers fail, eventually.

There are hardware failures, and software failures.

Hardware is nice, and tends to fail one part at a time. So, if your system breaks, you can replace the part, and be operational again.

That's assuming that you can still purchase the part. You may have a spare, but, does it still work? Some components, like electrolytic capacitors, can age and fail.

Software failures are harder to detect, and sometimes, software failures are invisible when they happen, but manifest symptoms later, with bad data being revealed.

Full Backup

A full backup is a backup of all the files. It's used in contrast to the incremental backup, which is a backup of files changed since the last backup.

A common problem with full backups of networks or large file servers is that they take a long time. Backing up 300 gigabytes of data can take over half a day (over a 1gigabit ethernet network, to a SATA 3, RAID 5 NAS box).

So, full backups are typically scheduled to run over the weekend, when fewer people are using the network.

If there's too much data, a full backup may not be possible. The only solution is to split the file system into separate branches, and backup the branches on different days.

Full backups are performed in conjunction with incremental backups, usually scheduled to run once a day in the evening. A typical schedule is to perform one full backup each month, and then perform an incremental backup each evening.

Generally, it's bad to schedule full backups that fall on the 1st, last, and 15th day of the month, because those are "paydays" and it's possible that accountants may need to use the computers. (I think that mean the second and third weekends are best.) That said, backups are important enough to run even if someone's working on the weekend.

Incremental Backups

An incremental backup is a backup of all the files that have changed since the last backup. Typically, you perform a full backup, then a series of incremental backups.

You can perform full and incremental backups using tools like BackupExec or NTBACKUP.EXE. All commercial backups can do incrementals.

Incremental backups take far less space than full backups, and also take a lot less time to perform. In some situations, it's feasible to run backups during the workday.

Restoration of files from an incremental backup are performed by restoring the latest version of a file. This is done automatically.

Generally, previous versions can also be restored, so incremental backups also serve as a way to archive changes to the file system.

In many cases, incremental backups are better for archives than full backups for archives. For one, files that are created, and then later deleted, in the interval between full backups, are not stored in the archival backup. The problem is, basically, size - because a full backup is the same size as all the files. Keeping incrementals as well requires the full size, plus space for all the incrementals.

Incremental backups after periodic full backups are the preferred way to perform backups.

Inverted, Multiple Backups with USB Flash Memory

When you have multiple computers, you might want to put your files on a USB flash disk (aka a thumb drive or jump drive), and backup data to your computer's disks.

Create a folder called "backups", and a folder within it called "usb", and copy your files into there. If you have automatic sync software, you can set it to backup the data frequently. Hard disks are so fast you won't even notice the backups.

Set this up on all your computers, and your risk of data loss is nearly zero.

Just let the data "settle down" and force a file copy before removing the USB flash memory.

You could also set up a similar scheme with a USB external disk. The only issue is that moving hard disks tends to damage them and lead to data loss.

If you have important data, consider using encryption software. Losing the USB flash drive with your vital data would be bad.

Secondary Backups

It's a good idea to run two sets of backups. For one, it's possible that the backup software can fail, leaving some data unsaved. It's also difficult to check the backups every day, so it's likely that some minor glitch could lead to several days without backups. You could run out of space on the backup device. A device may go offline and stay offline for no discernable reason.

Going more than one or two days without a functioning incremental backup is unacceptable. As more work is lost, there's a "network effect" where people depend on other people's work, and you have to involve more people in the disaster recovery effort.

A cheap way to avoid this problem is to run two sets of backups - a primary and a secondary backup - with the full backups staggered, and with longer runs of incremental backups on the secondary backup. Store the secondary backup on a different device (a hard disk in your PC is a good place).

This way, if the system fails on the primary, you can use the secondary to recover. If the secondary fails, you have the primary.

In my experience, you can run two backups and check them twice a week, and there is never a situation where both backups are failing, but there's occasionally a problem with one of the backups.

Subversion (svn) as a Backup

If you're programming (or managing programmers), you can use Subversion or any other revision control system as a backup.

If you're not using a revision control system at all, it's a good idea to start immediately.

Not only does the repository (the system where the code is stored) a backup of the code - it's also a way to roll back changes to your code. All the popular systems also enable team programming over the internet.

The few hours spent learning the system pay off many times over in saved time and mitigated risk.