Skip navigation.
Home

Unix Text File Database

Unix usually comes with a set of tools that help you manipulate tab delimited data files. Since I never really bothered to learn these, I figured I'd play with them and take notes.

cat - everyone knows this - concatenates files together. cat t1 t2 outputs t1 followed by t2.

paste - like cat, for columns. paste t1 t2 - if t1 and t2 both contain one column of data each, each row will have t1 and t1 data stuck together. Ex:
bash-2.05$ cat > t1
cat
dog
bird
bash-2.05$ cat > t2
meowmix
purina
seed
bash-2.05$ paste t1 t2
cat meowmix
dog purina
bird seed

cut - the opposite of paste. Extracts specific columns from the input.
bash-2.05$ cat > t3
fish food
shoe clothing
hut shelter
bash-2.05$ cut -f 1 t3
fish
shoe
hut

You can also specify characters, and ranges.
bash-2.05$ cut -c 1-2 t3
fi
sh
hu
bash-2.05$ cut -f 1-2 t3
fish food
shoe clothing
hut shelter

comm - report what lines are common between two files. Not sure how to use this yet.

join - like an SQL join. It's hard to explain, but, here's a good example.

bash-2.05$ cat > users
1 johnk
2 tarok
3 yurik
bash-2.05$ cat > tasks
1 sleep
1 fix things
2 sleep
2 take bath
3 sleep
3 cook
3 yell at taro
bash-2.05$ join users tasks
1 johnk sleep
1 johnk fix things
2 tarok sleep
2 tarok take bath
3 yurik sleep
3 yurik cook
3 yurik yell at taro

tsort - topological sort. Not db specific, but can be used to analyze graph data. Could be useful for analyzing something. Added here because it's interesting.

bash-2.05$ cat > graph
a b
b c
c d
e f
f g
g h
h c
d a
bash-2.05$ tsort graph
e
f
g
h
tsort: cycle in data
tsort: a
tsort: b
tsort: c
tsort: d
d
a
b
c

The other useful commands are: awk, grep, uniq, sort