Make Meme/LOLCats Text with the GIMP

This tutorial explains how to make the “LOLCats” or “Meme” font so popular with the kids today. You can see a lot of these on the 4Chan /b/ channel, but don’t go there if you have any illusions that our society isn’t full of degenerates. The meme font is Impact Condensed, but to get the right “look”, you need to create a black outline around the regular font letters. Continue reading Make Meme/LOLCats Text with the GIMP

Magic Folder to Convert DOC and OpenOffice to PDF

Most of the information in this post is derived from http://www.tech-faq.com/convert-word-to-pdf.html. The code’s reposted here as a service, because the code there needs some editing.

Also, a useful thread about executing OOo macros is at http://www.oooforum.org/forum/viewtopic.phtml?t=2619.

Here’s the code to automate the opening and saving of a file as PDF, using Open Office. Paste this into your standard macros.

Sub ConvertWordToPDF(cFile) 
	cURL = ConvertToURL(cFile) 
	' Open the document. 
	' Just blindly assume that the document is of a type that OOo will 
	' correctly recognize and open — without specifying an import filter. 
	oDoc = StarDesktop.loadComponentFromURL(cURL, "_blank", 0, Array(MakePropertyValue("Hidden", True), )) 
	cFile = Left(cFile, Len(cFile) - 4) + ".pdf" 
	cURL = ConvertToURL(cFile) 
	' Save the document using a filter. 
	oDoc.storeToURL(cURL, Array(MakePropertyValue("FilterName", "writer_pdf_Export"), )) 
	oDoc.close(True) 
End Sub 

Function MakePropertyValue( Optional cName As String, Optional uValue ) As com.sun.star.beans.PropertyValue 
	Dim oPropertyValue As New com.sun.star.beans.PropertyValue 
	If Not IsMissing( cName ) Then 
		oPropertyValue.Name = cName 
	EndIf 
	If Not IsMissing( uValue ) Then 
		oPropertyValue.Value = uValue 
	EndIf 
	MakePropertyValue()   = oPropertyValue 
End Function 

sub test()
	ConvertWordToPDF("file:///home/johnk/Desktop/cooking websites article.odt")
end sub	

To use this macro, you should execute it from the command line. Here’s a shell script that helps.

#!/bin/sh 
DIR=$(pwd) 
DOC=$DIR/$1 
/user/bin/oowriter-invisible "macro://Standard.Module1.ConvertWordToPDF($DOC)" 

And last but not least, my addition – a script that watches a folder called pdeffer and converts DOC files to PDF. The only problem is that only simple, textual files with graphics will work. More complex layouts will fail.

To use it, execute it in the background or start it in a “screen” session. It’s not a full-fledged service (and won’t be).

#! /usr/bin/perl

chdir '/home/johnk/pdeffer';

use Cwd;

%docs = ();
%pdfs = ();

sub main() {
  print "checking\n";
  opendir DH,'.';
  while( $file = readdir(DH) ) {
    if ($file =~ /.doc$/) {
      $docs{$file} = 1;
    }
    if ($file =~ /.pdf$/) {
      $pdfs{$file} = 1;
    }
  }
  closedir DH;

  foreach $d (keys(%docs))  {
    print "$d\n";
    $p = $d;
    $p =~ s/.doc/.pdf/;
    next if ($pdfs{$p});

    $filename = getcwd().'/'.$d;
    print "$filename\n";

    system('oowriter -invisible "macro:///Standard.Module1.ConvertWordToPDF('.$filename.')"');
  }
}

for(;;) {
  &main();
  sleep 10;
}

Advanced Screen Scraping With wget (and Mailarchiva)

I was testing a new product called Mailarchiva, and I misunderstood the instructions. The upshot was that a mailbox full of messages was moved into Mailarchiva, and I wanted to restore them to the mailbox.

Mailarchiva comes with a tool to decrypt its message store, but it didn’t work. The problem was that the main product and the utility package got de-syncrhonized, and the one tool I needed stopped working (because a method’s type signature changed). Also, despite being an open source project, they didn’t have sources for the utilities up on sf.net, so I couldn’t re-build the program to make it work.

Not being a major java programmer, I had a hard time coaxing the system to the point where it would run without an exception – problem was, the utility’s libraries expected one format for the message store, and the server’s expected another. It was getting really difficult.

I had some manually produced backups, but not of the current month. (I didn’t follow my own advice not to test with live data.)

You just can win, sometimes.

The solution, sort of, was to use the website dowloader, wget, to interact with the app via it’s web interface, and use that to download the messages to files. Screen scraping.

First, I found a page with great examples:
http://drupal.org/node/118759#comment-286253

Then, a quick visit to the wget man page:
http://www.gnu.org/software/wget/manual/html_node/Types-of-Files.html#Types-of-Files

Here’s the short version of how to do it:

The first step is to figure out how to log in and get a cookie.

The second step is to figure out how to download the messages.

The third is to figure out the range of pages in the results, and then write a loop to recursively download the messages from each set.

Then, finally, copy the .EML files up to the server via Outlook Express.

Here’s the long version:

First, you have to submit a web form, and get a session id in a cookie. Here’s the command I used:

wget -S --post-data='j_username=admin&j_password=fakepass' http://192.168.1.103:8090/mailar...

192.168.1.103 is the IP address of my test installation.

The –post-data line lets you submit the login form, as if you were typing it in and submitting it. To find the URL to submit, you look at the source of the login form.

Then, you inspect the output, looking for the Cookie. Then, concot a longer, more complex command to submit the search form:

wget --header="Cookie: JSESSIONID=62141726A04B7C8BDE24C32514EB19F3; Path=/mailarchiva" --post-data='criteria[0].field=subject&criteria[0].method=all&criteria[0].query=&dateType=sentdate&after=1/1/09 1:00 AM&before=12/18/09 11:59 PM&submit.search=Search' http://192.168.1.103:8090/mailar...

Note that we’re passing the cookie back.

Inspecting the resultant file will reveal that the search worked!

Then, you try to download the attachments by spidering the links, and downloading files that end in .eml.

wget -r -l 2 -A "*viewmail.do*" -A "*downloadmessage.do*" -R "signoff.do" -R "search.do" -R "configurationform.do" --header="Cookie: JSESSIONID=62141726A04B7C8BDE24C32514EB19F3; Path=/mailarchiva" --post-data='criteria[0].field=subject&criteria[0].method=all&criteria[0].query=&dateType=sentdate&after=1/1/09 1:00 AM&before=12/18/09 11:59 PM&submit.search=Search' http://192.168.1.103:8090/mailar...

That pretty much does what I want, but, I need to do it for a bunch of pages. The quick solution is to use the browser to find out what the last message is, and then write the following shell script:

for i in 1 2 3 4 5 ; do
wget -r -l 2 -A '*viewmail.do*' -A '*downloadmessage.do*' -R 'signoff.do' -R 'configurationform.do' --header='Cookie: JSESSIONID=62141726A04B7C8BDE24C32514EB19F3; Path=/mailarchiva' --post-data='criteria[0].field=subject&criteria[0].method=all&criteria[0].query=&dateType=sentdate&after=1/1/01 1:00 AM&before=12/18/09 11:59 PM&page='$i http://192.168.1.103:8090/mailar...
done

Note that a parameter was added to the post. It’s page.

A parameter was also removed, the submit value. Submitting the old value seemed to prevent the paging. There’s probably a branch in the code based on the type of “submit” you’re sending, because there are a few different buttons, with different effects.

Again, that’s discovered by reading the sources and experimenting.

So, I ran the script and waited a long time. Then, I shared the data via Samba (I coded this on a Linux box, but ran the application on Windows). A nice side effect was that the shared files displayed DOS 8.3 filenames. So, the messages, which were originally named “blah.do?id=21341342334.eml” became “BADJFU~5.EML”.

To upload, I used Outlook Express. Despite its bad reputation, OE is good at interacting with IMAP mailboxes, and its support for the .EML file format seems to be good.

Wget saved the day (but it was a long day).

Lesson learned or, “lessons refreshed” is really what happened.

I should have set up a test account, put mail into it, then archived it.

Additionally, I should remember that when dealing with “enterprise” software, it’s not going to work like Windows or Mac (or even Linux) software. Larger businesses are assumed to have certain processes that SOHO businesses don’t.

This would be a perfect application for a web service. It would avoid all the program execution problems. Instead of accessing the data through command line application, access it over the network, using a simple interface.

Additionally, this kludgy rescue would have been impossible if the application had been written to use a Swing GUI or a native GUI. The web interface made it possible to scrape the data out of the system.

As for Mailarchiva – if you are trying to archive your own mail server, it seems to be a good product. The docs could use some work 🙂 I found others, but Mailarchiva running on a Linux box would probably be the most stable solution. The bad news is that it’s not intended for archiving personal email accounts like Gmail, AOL and ISP accounts. So, it wasn’t the right tool for me.

What I really need is a free/cheap archiver for products like Gmail. It would both mirror and archive the IMAP folders, but allow the user to hold on to emails for as long as they wanted. So far, what I’ve found either doesn’t do folders, or doesn’t do archiving. Archiving is just saving every single email it sees, and retaining messages even if they’re deleted.

RAID 5 Parity. What is it, and how does it work?

One morning, I started wondering how RAID 5 parity works to rebuild a disk array. It seemed “magical” to me, that you can get redundancy and still use most of your disk capacity. So I searched for it… and turned up not very much info, and one other person’s unanswered question. A few articles explained it, but in a little more detail about performance, and less detail about the actual parity function. This article attempts to fill the gap.

Continue reading RAID 5 Parity. What is it, and how does it work?