Recently I was trying to make a program that would average the input of thousands of integers in a file and print the result. Furthermore there were thousands of files I needed to average. I wondered what would be the fastest way to do this? I could use more threads but wouldn’t memory I/O become saturated by just one or even two threads? To test this I made two programs AvgFileCreator.c and AvgMultFile.cpp. AvgFileCreator is a simple tool that writes sequences of integers into files as binary data. AvgMultiFile.cpp actually averages the integers in the file using the number of threads you specify on the command line.
I made 16 large files (~25 mb each) and timed how long it took to complete using different numbers of threads with the following command. I tested it on my 2012 Retina Macbook Pro that has 4 cores.
time ls <the_test_files> | <AvgFileMulti.out> <number_of_threads>
I ran the command several times before I began taking averages so cache misses, page faults would be negligible. I averaged the results over 10 trials for 1, 2, 4, 8 and 16 threads and found the following results.
For my purposes with the averaging program, this graph is the end of the story. Four threads is the fastest way to do it. However I was still curious if this meant that four threads was also the fastest way to read memory so I removed the averaging code and took a more time samples.
The results are not particularly surprising. Throwing more threads when trying to read in memory is only so helpful. You can probably get a speed boost by using more than one thread but not many after that. The fastest for my computer was 2 threads. Even 3 threads was slower.
The take away here is that the memory speed is well balanced with the number of cores for my test machine and I suspect the same for most well designed computers. If my computer had many more cores it would have been a waste because the memory is too slow. On the other hand if my computer had very very fast memory it wouldn’t matter because there wouldn’t be enough cores to process it any faster. All of this leads me to the conclusion that the fastest way to do a simple task in parallel with data in memory is to use all your cores and no more. A well designed computer will be bottlenecked by both the parallel processing power it has and the speed of it’s memory.
Checkout the bitbucket repo AvgFileFun for the code and readme.
From the movie Tron:Legacy
A few years ago as I had just started tinkering with Arduinos, I watched the dazzling movie Tron:Legacy. It’s not a grade “A” movie but the visuals are unbelievable. The idea of using an Arduino to create something that could capture someone’s fascination like the effects in the movie popped into my head. I forgot about the idea for a awhile but then recently stumbled across the TLC5940 chip that makes using large numbers of PWM outputs a breeze (read “Dimmable LED controller” if you’re unfamiliar). I created an Arduino powered array of 32 individually dimmable LEDs to display light shows. I put the display inside a picture frame and now it’s a constant source of intrigue. In this post I’ll share both how I made the device and the code LightArray runs so you can build one yourself.
But first a demo:
I recently made a small suite of programs that sorts words in a file in theoretically less than n*log(n) time and outputs only unique words. The master program works by piping the words to two “sort” programs to sort half of the words in the file. Finally another program merges the output of the sorts. It’s not terribly complicated but it’s a fun little program and the code came out very clean even for c. The program is a good introduction to forking, exec-ing, and piping. Heres a schematic of the program called Uniqify.
I say the program could theoretically sort faster than n*log(n) because it uses two different sort processes. If the OS was smart it would put these processes on different cores and thus do some of the sorting work simultaneously. In reality though, you will need a large input file to overcome the lost time in creating new processes and merging the sorted output back together.
Note: You must name the executable files correctly. Start the program with “./Uniqify”. It gets input from stdin so you’ll probably want to redirect an input file. Same for stdout.
Here is the source code:
Shell script that will compile everything and name it correctly: cmpall.txt
The DC3 algorithm is a powerful linear time suffix sorting algorithm. If you’re not familiar with suffix sorting, it is the process in which you sort the different parts of a single string. For example banana suffix sorted would be:
Efficient suffix sorting is needed in data compression, and substring matching. Practically speaking, suffix sorting can be used to search for text in an ebook, identifying long repeated sequences of DNA in a genome, etc.
The most obvious way to suffix sort is to use a n*log(n) comparison algorithm to compare all the suffices of a string. However in the last 20 years, faster linear time suffix sorting algorithms have been developed. These algorithms beat the n*log(n) barrier by avoiding redundant sorting associated with suffix sorting. For example, from the banana suffix sort above, if you have sorted “ana” and “a”, there is no reason to compare the entire suffices “nana” and “na”. Once you determine that they both start with “n”s you should just use the result of the “ana” and “a” sort. The DC3 algorithm uses this rationale to create an elegant solution for linear time suffix sorting. I’ll explain this algorithm by first exploring the process behind it and then by examining a java implementation I created.
Data Compression is one of the key components that has allowed an explosion in communications and the ability to store massive amounts of data.
There are many ways to compress data, however, one of the most elegant and effective is the Burrows-Wheeler Transform. In this article I will explain how the Burrows-Wheeler Transform works and offer an easy to understand java implementation.
An Overview of Burrows-Wheeler
The Burrows-Wheeler Transform is a text transformation that takes an input string and produces an output string with a high amount of repeating characters. For example the word “banana” is transformed into “annb$aa” ( the “$” will be explained shortly). This output can be efficiently compressed.
The magic of Burrows-Wheeler is it’s ability to create output that contains long runs of identical characters and it’s ability to transform the output back to it’s original state. I’ll explain how this works in two sections, Burrows-Wheeler Forward and Burrows-Wheeler Back.
Recently I checked my server’s access log to see who had been visiting my site. There is not a ton of content yet so I was shocked to see that I had over 3,872 requests since it’s launch. I can understand; I’m a pretty big deal, right?
Unfortunately a lot of the page requests were from crawlers. Your first thought may be “Google bots”, however they were only a minor player. Most of the web crawlers were “hack” bots, whose sole purpose was to break into my site. The requests would almost always be looking for unsecured parts of the site. Here’s what a typical part of the access log looks like:
Most of the requested files don’t even exist. The bot is just guessing random admin-esque urls. There are a ton of these attacks. I took the time to compile some basic info on them and found out a few interesting things.
- The total amount of requests related to attacks was about 20% of all requests.
- Most attacks made about 40 requests before giving up ( None succeeded)
- Most attacks lasted between 5 and 20 seconds
- Attacks originated from the following countries:
- USA, Poland, Singapore, Canada, South Korea, Japan, Australia, Taiwan, Brazil, Thailand, Germany, Venezuela
- The U.S. had the most attacks, 3.
- Interestingly some of the worlds largest countries by population were absent including: China(1), India(2), Indonesia(4), Pakistan(6), Nigeria(7), and Russia(8). (Population is not a good predictor of the “attacking” countries, however, GDP per capita is. Nine of the thirteen “attacking” countries had GDP per capita in the top 25.)
I remember when I installed the server there were a lot of disclaimers about maintaining security. When reading them, my first thought was always something like “Who’s trying to get into my tiny website? Not many people will even know I’m here.” I was wrong. The web is filled with crawlers, many of which are scanning for a free server. Check out a snippet from my access_log to get real glimpse at what even a tiny server like this one gets requests for.
Finally! This website is working like a well tuned machine. It’s been frustrating at times, but making making this website has been a lot of fun. I thought it’d let you guys know how I did it.
Starting from scratch:
I realize I could have chosen a web hosting service for a wordpress site. There are plenty of them and many come almost completely configured. However, that would have taken all the fun out of making my own! I started from scratch with my old desktop that has been gathering dust since I left for college and hooked it up to my home internet connection.
Setting up the machine:
First I got rid of windows and installed ubuntu 11.10. Next I installed XAMPP(an Apache server), proftp, and wordpress. Everything seemed to be going great, but I felt uneasy about not knowing enough UNIX. Particularly there are a lot of security risks associated with hosting a webserver. In order to understand what’s going on you need to know UNIX decently well. To put my anxiety to rest, I read two books: the ubuntu pocket guide and UNIX in a Nutshell. Both are great books and really helped me understand how UNIX works ( not to mention a great warm up for my upcoming Operating Systems class) After that I felt pretty confident in my website setup; it’s secure and runs smoothly.
The Dreaded DNS Configuration:
Everything I read about this said the DNS configuration, the process of making the website name, localhost, correspond to my IP address, would be terrible. It certainly wasn’t simple like the famous 5 minute wordpress install, but it was not the end of the world. I changed my firewall settings, set-up a static local IP address for my server and set my server to accept external requests. I used DynDns to register my domain and host the dynamic dns service to keep track of my ever changing IP( your home IP address is changed frequently as a result of your ISP dynamically generating IP addresses). I installed the DynDns update client(it tells DynDNS where to find your server) on my server and changed my wordpress site name to match my new domain name. About two hours after applying the domain name changes I got on my phone’s web browser and tested my site from outside our home wifi. It worked!
Keeping it humming along:
Finally, to keep the server running 24-7 I modified the BIOS to reboot after any power failures and scheduled all the necessary processes to startup when the computer restarts. When I need to login to the computer, I just use ssh. It just needs to be plugged in!
In theory the set-up is very straightforward but in reality it took a lot of head banging to make it work. Missing a detail in a configuration file could be miserable. The little errors slowed down the process a lot. However I’m really glad I did it. I have a new appreciation for UNIX and have come to really respect it’s power. My advice to any one looking at making their own website is to learn why things work. You can get going pretty quick with the manuals and how-to guides but once you hit a bump in the road you’re likely to be in way over your head.
I’m pretty excited that I was selected to be on the IFC exec board. I had a great time on the minor board and am happy I can keep working for an organization I really like. The VP of Communications has traditionally been a passive role on exec, however I plan to change that. I want to focus on event publicity and advertising rush. I’ve already come up with a simple and clean ad campaign we can use in the fall. Here is one of the photos. I’ll post the rest later. Click the image to see a better version!