Unix Sorting and Breaking n*log(n)

I recently made a small suite of programs that sorts words in a file in theoretically less than n*log(n) time and outputs only unique words. The master program works by piping the words to two “sort” programs ┬áto sort half of the words in the file. Finally another program merges the output of the sorts. It’s not terribly complicated but it’s a fun little program and the code came out very clean even for c. The program is a good introduction to forking, exec-ing, and piping. Heres a schematic of the program called Uniqify.

UniqifyOverview

I say the program could theoretically sort faster than n*log(n) because it uses two different sort processes. If the OS was smart it would put these processes on different cores and thus do some of the sorting work simultaneously. In reality though, you will need a large input file to overcome the lost time in creating new processes and merging the sorted output back together.

Note: You must name the executable files correctly. Start the program with “./Uniqify”. It gets input from stdin so you’ll probably want to redirect an input file. Same for stdout.
Here is the source code:

Uniqify.h
Uniqify.c
UniqParser.c
UniqMerger.c
Shell script that will compile everything and name it correctly: cmpall.txt

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>