Finding anagrams in a list

I was reading quora and came across a post about someone who was rejected from facebook for a sloppy answer. The question was to make a list of words where the anagrams were grouped at the top.

People consoled the guy.

The answer seemed like a unixy thing. It's not that hard, the trick is to create an index for each word, where the index is the word, sorted by letter. So "cat" gets an index of "act".

Then you do some unix magic and get a list of anagrams.

The code to add that index is this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int comparison(const void * a, const void * b) {
    if (*(char*)a==*(char*)b) return 0;
    if (*(char*)a>*(char*)b) return 1;
    return -1;
}

void main(int argc, char *argv[]) {
    char word[1024];
    char sorted[1024];
    size_t len;
    while(fgets(word, 1023, stdin) ) {
        len = strlen(word) - 1;
        if (len&lt;=0) continue;
        if (word[len]=='\n') {
            word[len]=(char)0;
            len--;
        }
        strcpy(sorted,word);
        qsort(sorted, len, 1, comparison);
        printf("%s %s\n", sorted, word);
    }
}

My C was pretty rusty, and got stumped by the pointers in the comparison. Thx to the internet for examples.

Then you can get a list of anagrams. You do some shell magic with the commands sort, cut, uniq and join:

cat /usr/share/dict/words | grep -v "'s" | ./a.out | sort > /tmp/sword
cat /tmp/sword | cut -f 1 --delim=' ' | uniq -c | grep -v 1 | cut -f 8 --delim=' ' > /tmp/alist
sort --key=1f -r /tmp/sword | sort --key=1f > /tmp/ssword
join --nocheck-order /tmp/salist /tmp/ssword | cut --delim=' ' -f 2  > anagrams

I don't think the step to produce ssword was necessary, but I was having trouble using join until I found the --nocheck-order option.

The problem was that sort didn't seem to sort words ending in 's' to join's liking.

Back to the real job...

AttachmentSize
anagrams.26.44 KB