Skip navigation.
Home

Making Wiki Work With Search Engines

Did you know that Google.com is the only major search engine that knows how to index your Wiki content? Egads! This is horrible. This page describes how to make your Wiki pages look like plain-old-web-pages.html!

First, this document works only if you are using Apache with mod_rewrite and Cunningham's Wiki.

If you're on a Unix box, you're probably using Apache. To make sure, you can try to browse, in Netscape, a nonexistent page, like http://www.riceball.com/nonexistent.file.txt. It has to be Netscape.

Then, you have to check that the web server can load in the mod_rewrite module. One quick check is "/usr/sbin/httpd -l", which prints out a list of modules compiled into the system.

If it's not in the list, it may still be a dynamically loaded module. Look in /usr/local/lib/apache or /usr/lib/apache.

Next, you will need to be able to create the Rewriter rules. This can be tricky. If you have access to httpd.conf, you can put the rules in there. Generally, that won't fly, because the last thing a sysadmin wants is you mucking through the main config files for a per-server issue. So, you'll have to create a .htaccess file in your web server's documents directory.

In the .htaccess file you need to add the following lines:

RewriteEngine on
RewriteRule wikiwiki/([A-Za-z0-9]+)\.html$ cgi/wiki.cgi?$1

This will cause the server to rewrite urls that look like:

http://www.riceball.com/wikiwiki/RiceWiki.html

Into the following:

http://www.riceball.com/cgi/wiki.cgi?RiceWiki

(If you cannot decipher this, check out http://httpd.apache.org/docs/misc/rewriteguide.html - the mod_rewriter guide.)

If this isn't working, the server may need to be reconfigured to allow you to use mod_rewriter. You may need the Apache FollowSymlinks option turned on.

Once this is set up and working, you need to modify the Wiki script to use these new URLs instead of the ones that call the CGI directly. I changed two lines in the script.

First, add this line:

$HTMLUrl = "http://www.riceball.com/wikiwiki/";

And then modify the lines that write the links, in AsAnchor, like this:

defined $db{$title}
# ? "$title<\/a>"
? "
$title<\/a>"
: "$title
?<\/a>";

As a final step, you might want to add a robots.txt file to your site. This can be used to tell robots to not index things inside your cgi directories. Generally, they don't by default, but sometimes, they will, and will hose your server with an unintentional DOS attack! Here's the file. It goes into your document root.

User-agent: *
Disallow: /cgi/

That should do it!

These tips can be applied with some care to other Wikis in other languages.

(Yes, this degrades performance significantly!)