Original Link: https://www.anandtech.com/show/1539
The AnandTech Linux XBOX PC Experiment
by Kristopher Kubicki on November 10, 2004 4:00 PM EST- Posted in
- Linux
Introduction
A few weeks ago, we started investigating the possibility of putting Linux on an XBOX. We played with some ideas in our heads, a render farm, a cheap office computer or a distributed crypto platform, just to start. The idea required a little bit of elbow grease, a mod chip, Linux and a bunch of free time.All XBOXes are locked into only booting the Microsoft BIOS. That is, if you buy a new XBOX, it's basic IO does not let you do all that much, except read DVDs, and XBOX games that have special keys encoded into them. A mod chip is a computer chip with another BIOS that physically overrides the Microsoft BIOS. Since the chip is now in charge of how the XBOX should bootstrap itself, it will allow the XBOX to recognize more discs and operations than for what the XBOX is specifically designed. Thus, backup games or entire operating systems can be loaded onto the XBOX hard drive and run from there.
People have been modifying their XBOXes to run Linux for a long time now. The main factor behind the XBOX modification scene sprung from some extremely gray markets that began selling mod chips for backup games. However, after several years now, the XBOX Linux community has grown very stable. XBOX runs on virtually off-the-shelf components; ergo, porting Linux to the XBOX was a no-brainer after the BIOS issue was resolved. Microsoft dropped the price of XBOXes a few months ago and BMMods approached us about the possibility of checking out Linux on an XBOX. With refurbed and used XBOXes as cheap as they have ever been right now, the stage was set for us to abuse as much XBOX as possible.
The goal today is to see if we can modify an XBOX successfully to do something useful that we can't do for the cost of the modification (aside from play XBOX). We will look at a media center, a basic PC and finally, a lot at the possibility of setting our XBOXes up in some sort of cluster, with detailed steps all along the way.
Costs
Unfortunately, the cost of building an XBOX PC runs a little more than the cost of the XBOX. We need to factor in the cost of the mod chip, probably a hard drive, keyboard and mouse. Mod chips run anywhere from $40 to $80; the one we use in this review costs about $75. A USB keyboard and mouse usually run another $15. If you are going to be doing any clustering, you do not really need to invest in a keyboard/mouse at all. For most uses, the 8GB hard drive is sufficient enough, although upgrading to a 20GB drive might be in order for a larger Linux distribution.Used and refurbished XBOXes range from $120 to $160. Used XBOXes are usually the way to go, since we will be soon voiding the warranty anyway to install the mod chip. When shopping for an XBOX to mod, older is sometimes better. Although the SmartXX mod chip works on all versions of the XBOX available, the newest version 1.6 XBOXes require a few extra wires to be soldered, even on the solderless install kit.
For our distributed computing ideas, we have an exciting analysis in store. We managed to round up 8 XBOXes with mod chips for this review. That only equates to 5.8GHz of distributed CPU power, 80GB of hard drive space, and just 512MB of memory. However, if our distributed computing project is successful, scaling to a much higher CPU clock might be very feasible. Finding an equivalent $1600 PC would be nearly impossible, but that assumes our distributed XBOX network actually behaves like a $1600 PC instead of 8 $150 PCs. It may be the case that network and disk latencies are too high for us to practically compute anything. There are also some issues on power consumption and noise. The XBOX is relatively quiet for a PC, unless you have a whole lot of them. Our lab recorded approximately 42dBA when our eight node XBOX cluster was on.
If you plan on just running your XBOX as a stand-alone PC, then costs like power become no issue. The XBOX consumes 100W at full load. For a 16-cluster node to operate for one hour, we need 1.6 kilowatt hours of power. If you pay 10 cents per kwh, that's about $1400 for one year of operation.
Putting it all Together
Modding the XBOX is really the simple part. We received a solderless SmartXX v2 from www.bmmods.com. These generally run for $75 if you get the solderless adaptor or $60 for the solder option. Other mod chips work well for installing a Linux distribution, but the SmartXX comes with an XBOX version of Debian (Xebian) and is the most mature chip for running a Linux distribution.Opening up the XBOX and getting at the mainboard was fairly simple. The SmartXX comes with PDF manuals that demonstrate how to unscrew all of the components. Below, you can see the process in a few quick steps. Opening the XBOX just requires an Allen wrench.
Opening the XBOX
The next step is to remove the hard drive and DVD drive so that we can get at the D0 pin hole on the motherboard. This will allow us to put the XBOX into debug mode.
Removing the drives
Now, we have the SmartXX finally mounted correctly. If you look carefully at the image below, you can see the copper wire from the pin pad to the D0 lead that gives our mod chip all the magical power.
Inserting the mod chip
Click to enlarge.
Total time for our installation was about 15 minutes, although a solder option would probably take a little but more than a half hour. Adding the mod chip to the XBOX was far simpler than any of us had originally thought, and after doing the process just once with a solderless option, we feel like we could easily do the process again with a solder option. This would save us a bit of cash if we were considering distributing our XBOX on a massive scale.
The SmartXX
SmartXX is unique in that it actually runs its own copy of embedded Linux. Before the XBOX has a chance to bootstrap its own BIOS, the SmartXX kicks in and bootstraps itself. The SmartXX chip then opens to the menu displayed before. Our SmartXX chip contains 4MB of memory, which can be configured to run various BIOS images to bootstrap the XBOX again. We can save multiple BIOSes in the built-in memory or revert back to the original Microsoft BIOS included with the XBOX. Below, you can see the BIOS and configuration chooser below.Inside the SmartXX configuration, we can actually specify and then terminal into the XBOX to configure options via a command line interface instead. We can also specify the default location of where the XBOX should boot; just another step while verifying our install went OK.
Advantages and Drawbacks of the Design
The XBOX PC is just a 733MHz Pentium III with 64MB of RAM. 733MHz is extremely weak by today's expectations. 733MHz is not enough to run PC games today, barely enough to run Windows XP and certainly not enough to do anything practical, but play XBOX, or is it? Say what you want about Microsoft, they were onto something when they thought of the XBOX. The USB controllers, built-in hard drive and Ethernet all make for a surprisingly good platform to run basic computing: email, word processing, internet, etc.There are also a lot of features that make the XBOX an intuitive design for a media center PC. XBOX has an integrated DVD player and a reasonably efficient controller (the game pad). The machine ships by default with composite cabling, but $20 at any video game outlet will get you a component cable package instead. With the exception of PVR functionality, the XBOX would also make a pretty good media center as well. Spending a few dollars on an IR kit for the USB ports adds even more creature comfort.
Keep in mind, the Pentium III Coppermine used in the XBOX is a little different than a normal Pentium III. In fact, the XBOX Pentium only has half the L2 cache of a normal Pentium III, but the 8-way associative paths are left on the processor whereas on the Celeron variant, these paths are disabled. This puts expected performance between a Coppermine Pentium III and a Coppermine Celeron. You may wish to read up on Anand's analysis of the entire XBOX architecture from 2001. The P3 is old architecture - don't expect any miraculous performance out of this processor.
One of the better features of the XBOX is its small footprint in a "stackable" design. So many things about the XBOX just scream, "turn me into my own server rack".
With any distributed cluster, the importance of network latency becomes an issue. Our XBOXes only support a 10/100Mbps network adaptor, and that is far too slow for some serious cluster computation. With only eight nodes, we do not expect to see large latency issues, but maybe we are in for a surprise. Below, you can see a network transfer of a few hundred megabytes:
226 Transfer complete.
779669036 bytes received in 83.35 secs (9135.0 kB/s)
Furthermore the single, dual channel PATA interface limits our ability to use the XBOXes as a high availability network attached storage (NAS) solution; there are not enough interfaces for us to run more than two hard drives - that assumes we take the DVD player out. The 100Mbps limitation on the network interface also dampens our thoughts of any NAS as well. The default XBOX hard drive only runs at 5400RPM; somewhat slow if we plan on doing a lot of disk access. There also seems to be an issue as to how effectively we can replace the 40-pin DMA33 cable with something a little more capable. We are mostly bogged down by network IO for most clustering applications, but the hard drive limitations could come back and bite us later on. As a small indicator of hard drive speed, we ran a few tests to see if it was worth replacing the hard drives in our XBOX cluster.
Unfortunately, getting UDMA100 to work does not seem possible with our configuration. Although we have seen various techniques in forums in order to squeeze that last bit of performance out of the hard drive by hacking your own cable, all of the methods that we attempted did not yield faster bus speeds. The difference between the 5400RPM and 7200RPM drive looks too insignificant for us to continue using the 7200RPM drives in the cluster.
Thus far, we have decided our XBOX has the right configuration to run a very cheap, simple desktop/email PC, a stripped down media center or a distributed cluster that does not rely too heavily on network latency.
The Test and Initialization
Obviously, we want to benchmark a few things on our XBOX cluster, but we need some points of reference. To do that, we took a few systems that we had around the lab of various cost and power, and configured them with SUSE Linux 9.1. All of our benchmarks are run without X running, since we want to minimize the load on the systems. Below is a breakdown of several of the systems that we are looking at to compare our XBOX on a desktop level.Various Cheap Desktops | |||
Desktop | XBOX Linux Cluster | Sempron 2200+ | Celeron 2.0GHz |
Processor | PIII 733MHz 128K L2 | Sempron 2200+ | Celeron 2.0GHz |
Motherboard | NVIDIA MCPX X3 | MSI K7N2G | ASROCK P4I45GV R5 |
Hard Drive | Seagate 5400RPM 8GB | Seagate 5400RPM 8GB | |
RAM | 64MB Shared PC3200 | 256MB PC2100 | |
Operating System | Xebian 1.03.2 SUSE 9.1 |
SUSE 9.1 | |
Kernel | 2.4.26 (Xebian) 2.6.4 (SUSE) |
2.6.4 | |
Compiler | linux:~ # gcc -v Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/specs Configured with: ../configure --enable-threads=posix --prefix=/usr --with-local-prefix=/usr/local --infodir=/usr/share/info --mandir=/usr/share/man --enable-languages=c,c++,f77,objc,java,ada --disable-checking --libdir=/usr/lib64 --enable-libgcj --with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib64 --with-system-zlib --enable-shared --enable-__cxa_atexit x86_64-suse-linux Thread model: posix gcc version 3.3.3 (SuSE Linux) | ||
Cost | $210 (including mod chip) | $287 | $277 |
As you can see, it's pretty difficult to get a cheap AMD or Celeron system in the fray (cost includes case, CPU, motherboard, memory, hard drive, DVD Player and cooling). All configurations use integrated video. The Semprons and Celerons are going to be much faster than our 733MHz XBOX, although the footprints on the Sempron and Celeron setups are pretty terrible - they take up the full space of a mid-ATX case. The XBOX is also the significantly quieter solution. Unfortunately, the XBOX has very little memory and when we fire X, the XBOX is really going to take a hit in performance.
As you may notice, we are still running SUSE 9.1 instead of 9.2; we haven't had the chance to validate and update 9.2 yet, and our server benchmarks were already done on SUSE 9.1 configurations with GCC 3.3.3.
Various Performance Configurations | |||
Desktop | XBOX Linux Cluster | Dual Opteron 250 | Dual Xeon 3.6GHz |
Processor | (8) PIII 733MHz 128K L2 | (2) Opteron 250 | (2) Xeon 3.6GHz |
Motherboard | NVIDIA MCPX X3 | Tyan K8W | SuperMicro X6DA8-G2 |
Hard Drive | (8) Seagate 5400RPM 10GB | Seagate 120GB 7200RPM IDE 8MB Cache | |
RAM | 64MB Shared PC2100 | 4GB DDR-400 | 4GB DDR2-400 |
Operating System | SUSE 9.1 | SUSE 9.1 | SUSE 9.1 |
Kernel | 2.6.4 | 2.6.4 | |
Compiler | linux:~ # gcc -v Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/specs Configured with: ../configure --enable-threads=posix --prefix=/usr --with-local-prefix=/usr/local --infodir=/usr/share/info --mandir=/usr/share/man --enable-languages=c,c++,f77,objc,java,ada --disable-checking --libdir=/usr/lib64 --enable-libgcj --with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib64 --with-system-zlib --enable-shared --enable-__cxa_atexit x86_64-suse-linux Thread model: posix gcc version 3.3.3 (SuSE Linux) | ||
Cost | $1680 (included mod chips) | ›$4000 | ›$4000 |
Our XBOX cluster costs considerably less than a performance workstation or server. The machines listed above were actually tested back in August, so we used the same OS and compiler setup from August. The eight-node XBOX cluster comes in at less than half that of the Opteron or Xeon setup, but of course, cannot offer us 64-bit capabilities amount other things.
Installing SUSE on the XBOX for the first time was not a simple task. The problem is that that the 8GB hard drives shipped with each XBOX is "locked" (you may read more about this here) so that it may only be booted by its own XBOX. There is an old, hacky guide that details how to get the normal XBOX hard drive to boot on a PC so that we can cross-install an operating system. The basic jist of the article follows:
- Leave the hard drive connected via the IDE and power cable to the XBOX and boot the Microsoft BIOS
- Boot the PC, but pause the installer before the Linux kernel loads
- Hot swap the hard drive from the XBOX to the PC (dangerous)
- Install Linux near the end of the hard drive
- Disconnect IDE cable, reboot PC
- Connect IDE cable before kernel initializes, then continue with installation and configuration of the OS
- Power down PC and XBOX, then reconnect the hard drive to the XBOX as usual
In the SmartXX OS, we actually have a tool to unlock the hard drive completely (or lock it back up later). This saved us a ton of work and we simply unlocked all of the 10GB hard drives shipped with our system. Thanks, SmartXX!
Desktop Performance
There are several routes that we can take if we are just interested in converting our XBOX into a desktop. Our first option is to just boot the XBOX via the Xebian LiveCD bundled with the SmartXX mod chip. Xebian is a modified Debian LiveCD that comes with Freevo, Mozilla, GCC and a few other goodies. Choosing the Linux CD option in the SmartXX Linux boot brings us to this screen shortly before automatically launching X:Welcome to the : Xebian Version : 1.0.3.2-smartxx-edition Author : Edgar Hucek (hostmaster@ed-soft.at) Hostname : xbox.localdomain.local Linux Ver. : 2.4.26
The first time that we ran Xebian without the Ethernet connected, the XBOX actually hung when we launched Mozilla. There is not much denying it - the XBOX PC is not any sort of workstation replacement. Performance benchmarks are not going to be very good at all, particularly compared with some other hardware solutions available today. However, for $200 bucks, the total system cost packs a "little" bit of a punch. For reference, we benchmarked a few small utilities here, just to show a point of reference on performance. Obviously, some of these systems use CPUs that cost more than the entire XBOX PC. Don't expect the XBOX PC to win any awards, but notice how well it performs for the price.
After running Xebian, we blew away the hard disk and installed a stripped down copy of SUSE 9.1 without the X window system. SUSE runs on the 2.6 kernel while Xebian runs on 2.4. Installing SUSE 9.1 was not very difficult; we cannibalized most of the modules and dependencies from Xebian and then essentially merged SUSE into the Xebian install. This gets a little messy, but provides us with a somewhat uniform platform for comparing our other benchmark machines. We compiled gzip from scratch using GCC 3.4.2 on both configurations. Below, you can see the machine gzip the same 700MB file that we use for our other gzip tests.
xbox:/mnt# time gzip 01.wav -c >/dev/null
We also decided to encode an MP3. Below, you can see the command that we used to encode the MP3, and playtime multiplier is listed in the graph.
# lame sample.wav -b 192 -m s -h - >/dev/null
We can see from here that the performance is a tad faster running the OS from the hard drive rather than the LiveCD. Xebian lags heavily to do much of anything, including just email. Xebian does not come with an office suite, although when we installed Open Office, we had a bit of difficulty using it effectively. Running a local install onto the hard drive was significantly faster and recommended instead of running the Xebian CD.
Keep in mind, the system and video card share the same memory; tasks like Mozilla are incredibly slow, since we are taxing the system memory and the video memory at the same time. If you plan on running X on this type of system, you may be better off grabbing a minimal desktop like Blackbox or something that does not rely as heavily on video memory.
XBMC
If you don't feel like turning your $150 XBOX (technically $210 XBOX + mod chip) into a $150 computer, you can always turn it into a $150 DVD player. XBOX Media Center, or XBMC, is a full-featured package that can be run on any modified XBOX. XBMC mimics Windows Media Center Edition from Microsoft, but is an open source software. Not too long ago, we took a look at MythTV, a Linux-based media center package meant to be run on PC hardware, and discussed the differences between it and Microsoft's Windows Media Center Edition.XBMC is similar to MythTV in many ways, without capture capability. Obviously, the Linux/open source aspects of the software put them in the same category, but many of the features are similar also. Since XBMC is open source, developers can implement new features to expand on the package as they feel necessary, and the most obvious add-ons implemented on one can easily be added to the other. For example, the weather forecast program built into XBMC has been coded as an add-in for MythTV. This does not necessarily mean that it is the same code, but rather just the same idea.
Besides the extras, XBMC's core functionality is the ability to play audio and video of various formats (with the help of different codecs), slideshows of pictures, as well as CDs and DVDs from the DVD-ROM drive. Programs can also be launched from XBMC as an alternative to the OS installed after modding the hardware. The only function missing from XBMC, which is included in PC-based media center packages, is the TV functionality. Since a TV tuner card cannot be installed in an XBOX, media playback is limited to local/networked/streamed material. If you use your XBMC as a local platform to play ripped movies off the network, you have a very powerful, sleek network player.
Since Microsoft has gladly included a 10/100 Ethernet port on the XBOX, it can be networked to any PC network to allow sharing of files through SMB shares or FTP. For XBMC, SMB shares can be set up to stream files over a local area network.
XBMC is also fully skin-able to further personalize the experience. Using PNG files, like MythTV does, XBMC can be customized to mimic Windows Media Center or any other media center package.
Right now, XMBC is a little rough around the edges. The weather plugin and network browser are certainly awesome features, but DVD playback and menu options look like they need a little work still. XMBC shows some promise, and when the project matures enough,we will definitely anticipate using it in a more ambitious manner.
A Beowulf Cluster
So far, we have played around a little bit with the idea of a stand-alone XBOX doing some neat things. But what if we want to actually make a high availability processing cluster across all of our Linux machines at once? This is the murkier world of XBOX PCs, distributed computing. There are a lot of really good documents detailing how to set up a secure, robust and stable Beowulf Cluster, but this isn't one of them. We only want to benchmark 8 XBOXes in parallel operation.We started by finding a good location for our XBOX cluster to run. A few 2x4 wood boards stacked together on the lab floor were good enough for us, apparently. Just to get an idea of how big our cluster was going to be, we stacked the XBOXes up in two columns of four as you can see below. It is a good idea to have plenty of room on all 4 sides of your cluster so that you can easily check cabling and such.
A 16 port gigabit switch provided us with network connectivity. Unfortunately, we are not actually using gigabit connectivity since the XBOX network devices only operate at 100Mbps. A 100Mbps switch usually runs for about $50, although it isn't wise to go too cheap on the largest bottleneck in the cluster. We crimped our own cabling as you can see in the image below. Even though we are just hacking a system together, keeping the cabling organized is very important since diagnosing a bad system will be nearly impossible if wires are all over the place.
The next step was actually placing the chip in each machine and removing the hard drive so that we could image it. Modifying each of the XBOXes took some time. It took us a few hours to get all of the mod chips in correctly and start up the SmartXX BIOS. By default, SmartXX does not boot to Linux, so we had to tell the BIOS after each chip was inserted.
First, we installed Linux on the master XBOX machine. This machine will eventually become the master of the cluster, but we will use it as a template for the other machines before we clone the hard drive onto the other XBOXes. All of the programs that we planned on using, such as distcc, djohn, etc., were installed on this template machine.
We will need to identify each of our XBOXes in the hosts file. This will allow us to identify each machine without doing a DNS lookup on its alias. Our hosts file looks like this:
master# cat hosts | |
::1 | localhost |
127.0.0.1 | localhost |
192.168.1.10 | master |
192.168.1.11 | slave1 |
192.168.1.12 | slave2 |
192.168.1.13 | slave3 |
... |
By using a hard drive blaster, we able to image all of our other XBOX hard drives in about an hour. Using Ghost or another drive imaging utility will work as well, but you can only do one or two drives at a time instead. If you are really desperate (or cheap), installing Linux on each machine manually works, but leaves the most room for error. Right now, we have eight XBOXes that are exactly identical in every respect. We need to designate one of these to be the master, so the first machine is booted up for a little bit of configuration. Dhcpd, the program that will assign IP addresses to each of our slave machines, was configured as such:
# dhcpd.conf # option domain-name "cluster"; option domain-name-servers 192.168.1.200; option subnet-mask 255.255.255.0; default-lease-time 3600; max-lease-time 86400; ddns-update-style none; subnet 192.168.1.0 netmask 255.255.255.0 { range 192.168.1.11 192.168.1.17; option routers 192.168.1.1; }
We really don't care which IP gets assigned to which machine; although, if we boot each XBOX in sequential order on the rack, their IP address will line up with their position in the boot sequence.
A Beowulf Cluster (Con't)
Since building a cluster of Linux XBOXes was something that no one at Microsoft (probably) ever intended, there are some rough edges. The machines do not stack entirely well as the top of each XBOX is slightly curved. It would be asinine to use anything other than duct tape to correct this problem; although, heavy duty double-sided tape would probably make more sense if you were not trying to cover the air intakes on the sides of the XBOX.Now that all of our machines have hard drive images and have been "installed" in our rack, we restarted dhcpd on the master machine and powered on the cluster. On the master XBOX, we can view the slaves connecting to the master and gaining an IP.
Configuring the cluster to do different things can be a little difficult; obviously, we do not want to log into each machine and execute each command by hand. Instead, we opted to use ssh to launch commands remotely. We wrote a simple script below named cluster_control.sh:
#!/bin/bash for i in 192.168.1.11 192.168.1.12 192.168.1.13 ...; do ssh root@$i $@ end
Since we do not want to enter our password every time we run a command, we wrote another script to set up our public keys and allow each slave machine on the cluster to authenticate via authorized keys instead of a password.
#!/bin/bash ssh-keygen - t rsa for i in 192.168.1.11 192.168.1.12 192.168.1.13 ...; do scp ~/.ssh/id_rsa.pub root@$i:~/.ssh/authorized_keys end
Now we can run single line commands via our cluster_control.sh script. Given a little more time, we could create something a little more robust, but we are just trying to launch some benchmarks. You'll notice that we are just logging in as root and ssh'ing around with no regard for what programs we are running as root. Security is not an issue here because our cluster network is physically removed from all other networks. However, if this were a real cluster, we would take more drastic measures of locking the machines down. We have just set up a very basic Beowulf Cluster made out of eight XBOXes.
Distributed Compiling
Distributed compiling was one of the original goals of this project from the beginning. We do not need to throw a real lot of computing power at gcc in order to compile something, but compiling some things (like GCC itself!), we can really benefit by using a lot of different jobs if hard drive and network IO do not slow us down. There happens to be a very excellent program called distcc that acts as a front end to run GCC over several machines at the same time. Since we installed distcc on our original cluster master before cloning it, we only need to jump start the daemon using our cluster_command.sh script.Network IO could really hurt us here. Each slave machine can utilize the entire 100Mbps of network traffic, but our master might be uploading to more than one machine at a time. If we have to upload files to seven other machines on the cluster at once, we would be limited to only 1.7 megabytes (14.3 megabits) per second. It may make a lot more sense for us to run a separate dedicated PC with a gigabit Ethernet card as the master instead - at least for distcc. We will test both cases here to see how network IO affects our build.
Since there are eight machines on the cluster, we want to make sure that there are enough make jobs running to satisfy each processor on the node. Deciding the exact number of jobs under distcc is not an exact science, but fortunately, all of our machines are the same speed, so that will alleviate some headaches. We tried compiling GCC 3.4.2 on our cluster using distcc under 9 and 17 jobs.
# ./cluster_command.sh distccd -daemon # export DISTCC_HOSTS='master slave1 slave2 slave3 slave4 slave5 slave6 slave7' # make - j9 CC=distcc
We didn't really see the performance numbers that we were looking for here. We first anticipated poor performance on the XBOX cluster due to its small amounts of RAM. "In theory", if we can get our cluster to scale to 16 nodes without a huge performance hit, we would see very impressive compile scores. However, running 16 threads on a single user application does not occur that often, even with a large compile like GCC, since many things need to be made in order. A multi-user environment running hundreds of compiles at once would benefit from so many nodes; perhaps a community cross-compiling station. Just in case, the master XBOX cluster is not performing poorly due to the meager 100Mbps network card. We reconfigured the cluster to obey a different host with a gigabit Ethernet card. We ran the same command as above.
It looks like our cluster really didn't get too affected by the network IO after all. If we anticipate running more XBOXes, however, running the cluster from a dedicated master with at least one gigabit Ethernet card would be absolutely necessary.
Distributed Rendering
Consumer distributed computing first became vogue in the mid 90's. While the NSA was busy simulating nuclear blasts on big iron, thousands of off-the-shelf computers a day were happily rendering frames for graphic artists. Even as Donald Becker and Thomas Sterling were coining the phrase, "Beowulf Cluster", people at Pixar and Alias were already well experienced in render scheduling on expensive SGI machines, but almost immediately, people began toying with the idea of splitting render jobs over multiple Windows NT machines. The Windows NT render farms were really the first Beowulf clusters used for anything important, and not surprisingly, there happens to be a lot of distributed rendering software out there. The process of rendering - that is taking a raw 3D data file and then raytracing it to produce a high quality image or set of images - happens to be very distributed computing friendly. Early render farms were very basic; if 100 frames needed to be rendered and 10 machines had render software, the host machine simply sent 10 frames to each computer and waited for them to render. Today's software is much more advanced, and network latencies are low enough that the host renderer can apply components of a frame to several different computers at once.There is some really excellent software like Rush Render Queue that will enable us to use software like Mental Ray for multiplatform rendering. Apple's QMaster also has some great features for distributed rendering for Shake. Today, however, we are just looking at a free ray tracer, POV-Ray. Below, you can see how our configurations ran under various options of the SMPOV addon for POV-Ray.
This test fringes theoretical. Since there are no Linux SMP or multiprocessor versions of POV-Ray, we had to improvise. POV-Ray allows us to render only a component of a scene if we desire, so using NFS we setup a directory where all the XBOX nodes had access the benchmark.ini files. Each file was modified to only render a specific portion of the scene based on the machine host name, and then the benchmark was launched via our scripts. We went through the same process for our Opteron and Xeon machines, launching one render process per processor, effectively running the renderer in SMP mode.
If our Distributed Compiling test was any forewarning, we weren't surprised that the XBOX cluster had a very difficult time being very effective here. Let's hope that the cluster can hold up a little better under encryption algorithms.
Distributed Hashing
As far as distributed computing goes, distributed hashing has always been one of the original uses of a distributed network. A novelty to some, a necessity to others, hashing a lot of keys takes a very, very long time depending on the algorithm. Hashing keys does not require as much memory as rendering or compiling, which will probably work well for the XBOX cluster. Below, you can see a quick print-out of how fast OpenSSL works on a single node of our XBOX cluster, in comparison to the Sempron 2200+ machine mentioned earlier.XBOX
OpenSSL 0.9.7c 30 Sep 2003 built on: Mon Feb 23 18:18:12 GMT 2004 options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx) compiler: cc available timing options: USE_TOD HZ=128 [sysconf value] timing function used: getrusage The 'numbers' are in 1000s of bytes per second processed. type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 557.47k 1546.09k 2091.52k 2292.74k 2362.03k mdc2 0.00 0.00 0.00 0.00 0.00 md4 9312.01k 50638.08k 104728.32k 143168.85k 160380.25k md5 3405.55k 15094.19k 25995.73k 31601.87k 33746.11k hmac(md5) 1456.18k 8698.11k 19730.66k 28729.00k 33202.18k sha1 3989.46k 15676.20k 27201.45k 33287.51k 35621.55k rmd160 2942.53k 12834.88k 21769.13k 26220.89k 27904.68k rc4 42422.11k 56691.63k 59720.45k 60751.63k 60867.38k des cbc 7463.69k 8229.14k 8364.52k 8364.03k 8366.76k des ede3 2861.27k 2966.78k 2985.30k 2990.08k 2990.08k idea cbc 0.00 0.00 0.00 0.00 0.00 rc2 cbc 5796.07k 6218.69k 6275.93k 6291.11k 6313.30k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 blowfish cbc 16639.31k 20924.97k 21380.52k 21482.50k 21570.44k cast cbc 12566.61k 14372.20k 14635.41k 14653.44k 14666.41k sign verify sign/s verify/s rsa 512 bits 0.0035s 0.0003s 286.7 3299.3 rsa 1024 bits 0.0195s 0.0010s 51.2 1049.0 rsa 2048 bits 0.1167s 0.0032s 8.6 314.0 rsa 4096 bits 0.7650s 0.0111s 1.3 90.3 sign verify sign/s verify/s dsa 512 bits 0.0032s 0.0039s 313.5 259.5 dsa 1024 bits 0.0097s 0.0120s 103.3 83.4 dsa 2048 bits 0.0317s 0.0389s 31.6 25.7 OpenSSL>
Sempron 2200+
OpenSSL 0.9.7c 30 Sep 2003 built on: Mon Feb 23 18:18:12 GMT 2004 options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx) compiler: cc available timing options: USE_TOD HZ=128 [sysconf value] timing function used: getrusage The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 1307.26k 2797.72k 3917.72k 4364.70k 4511.21k mdc2 3019.04k 3408.81k 3522.52k 3549.56k 3565.32k md4 11304.80k 39934.94k 117963.41k 229924.63k 318395.57k md5 8789.64k 29747.05k 81595.00k 144216.44k 185556.36k hmac(md5) 5081.28k 18230.65k 57049.28k 121655.26k 181687.44k sha1 8715.38k 26775.46k 63727.77k 97333.12k 115140.36k rmd160 7386.51k 21200.32k 45464.16k 64026.54k 72683.38k rc4 88866.45k 94630.03k 96065.66k 96649.60k 97376.13k des cbc 18098.33k 18586.49k 18717.35k 18809.36k 18772.63k des ede3 6551.90k 6620.66k 6660.51k 6666.11k 6667.71k idea cbc 0.00 0.00 0.00 0.00 0.00 rc2 cbc 15875.50k 16446.24k 16598.92k 16634.42k 16655.25k rc5-32/12 cbc 73740.31k 88512.81k 92535.12k 94040.06k 94410.97k blowfish cbc 44137.47k 48594.05k 49798.41k 50179.96k 50288.17k cast cbc 30171.40k 32328.70k 32877.09k 33049.36k 33011.88k aes-128 cbc 37067.76k 37903.95k 38377.07k 38499.61k 38531.59k aes-192 cbc 32505.08k 33104.00k 33376.89k 33556.62k 33580.99k aes-256 cbc 28702.51k 29251.51k 29532.40k 29604.97k 29624.04k sign verify sign/s verify/s rsa 512 bits 0.0014s 0.0001s 702.7 8190.2 rsa 1024 bits 0.0074s 0.0004s 135.4 2648.0 rsa 2048 bits 0.0452s 0.0013s 22.1 798.2 rsa 4096 bits 0.2956s 0.0042s 3.4 236.2 sign verify sign/s verify/s dsa 512 bits 0.0012s 0.0015s 820.1 653.8 dsa 1024 bits 0.0038s 0.0048s 266.1 209.8 dsa 2048 bits 0.0122s 0.0152s 81.8 65.6 OpenSSL>
Our hopes of making this XBOX distributed cluster thing worthwhile experiment are slowly diminishing. Again, we aren't out to break any speed records here, but we would love to see some XBOXes scale well. As another point of reference, our Opteron 150 machine can sign about 1063 RSA 1024 keys per second (in a 64-bit environment); that's approximately 20 times faster than what the XBOX is capable of.
The XBOX is very slow, but having a whole lot of them might scale well. We look to one of our favorite programs, John the Ripper, for more advice. Distrubuted John, or djohn, behaves similarly to make, assigning a different portion of the total keyspace to each machine. Djohn scales extremely linearly, since code cracking times are very long and there are very few places for network latency to interfere with our hashing. Since it takes basically the same amount of computing power to hash the same length key with a different character set, our total password cracking power is equal to N times the password cracking power of one machine, where N is the number of machines in the cluster (assuming the machines are all the same speed). You can see below how various machines performed JTR benchmark tests. JTR was compiled with GCC 3.3.3 with only MMX optimizations.
A star denotes estimated performance. Our eight-way cluster fares pretty well against its workstation competition, but properly manipulating the compile options could tilt the results in any configuration's favor. Also keep in mind that 64-bit compilations of JTR yield up to 20% performance boosts as well on Blowfish, and we get those performance benefits from the x86 platform on which the XBOXes run. Cracking keys shows some immediate promise on our XBOX cluster, but looks to be the only real application.
Not to give the XBOX too much credit here, the puny Sempron performs much better at Blowfish and MD5 hashing. Building an equivalent cluster of Sempron machines would yield more than double the crunching power on Blowfish, and even more on MD5.
Final Thoughts
Turning the XBOX into a slow desktop or a limited Media Center had its advantages. The XBOX seems fast enough to do some everyday computing like email and web, although we wouldn't recommend using the bundled Xebian distribution over a local install. The XBMC software package looks like an excellent work in progress that we ended up spending more attention on than we originally anticipated. The ability to play back DiVX movies from a network fileserver or just bring up the weather instantly really made us wish MythTV had such degree of control over the network. Of course, XBMC does not utilize PVR functionality. Hopefully, some of the excellent work from XBMC ends up in projects like MythTV and Freevo for non-XBOX folks as well.After several days of configuration and set up, we finally got our cluster up and running. Costs of the cluster were a little higher than we had originally anticipated; we bought a hard drive blaster, a switch and various cabling, duct tape and shelving. Other costs added another $150 to the price of eight XBOXes ($1200) and mod chips ($480). The total cost of the cluster as configured in the article came out to $1830 - the cost of two Opteron 250s and a very poor dual socket motherboard. Unfortunately, we might have expected too much of our XBOX cluster. The saying goes, "Many hands make light work." The addendum should read: "unless the hands are actually four-year-old stripped down processors made for Microsoft." Probably, had we tried this experiment in early 2002 instead of late 2004, we would have had more shocking results. Cracking keys on the distributed XBOXes showed a lot of promise fortunately, particularly if we can get the network to scale high enough. Other projects that require constant CPU operation like folding@home and seti@home would be the best use for such a cluster; just remember the $1400 electricty bill per year for a 16-node cluster.
The ability to add more memory to our XBOX would have significantly boosted performance in render and compile operations. Since encryption/hashing relying mainly on computing power alone, our XBOX might be out of luck there - granted our cluster performed the best under this test. VIA's EPIA platform has small hardware optimizations for many encryption algorithms, which we would probably see much better performance there in a different analysis. We saw in this analysis that while the XBOX does alright in some benchmarks, it's only advantages are price and footprint. Building an equivalent cluster on other ~$200 PCs (that can be upgrated none-the-less) would theoretically yield far better performance. Expect a low-cost DIY Linux cluster guide from us in the future.
As far as clustering goes, we can do some pretty similar things with VIA's EPIA platform as well, and that will probably be the focus of a different distributed Linux project. VIA's 1000MHz Nehemiah platforms run for about $150 without memory, case or hard drive, although you can still find some of the older 800MHz EPIAs for about $100. However, XBOX has the advantage of a readily available and extremely standardized setup. Finding a chassis combination for the EPIA platform that serves a render farm correctly might be a little harder for a VIA based approach, but we will leave that for a different article. We also would like to setup a similar cluster with PlayStation 2 consoles, but that may be another article as well.
Setting up a cluster has its advantages, provided you can utilize programs that will correctly take advantage of as must computing power as possible. Even though an XBOX cluster will scale very quickly for a relatively low price, lacking the ability to upgrade CPU and memory really drag our performance down. The amount of computing power that we demonstrated today on the eight-node cluster only resides in a 3' by 2' by 2' volume, which is excellent for an eight-node Linux cluster. The practicality of our cluster turned out to be fairly negligible; had we seen some really outstanding performance, we probably would have been able to justify the hours of work and configuration. Perhaps NetBSD or Linux will get an early jump on XBOX 2 so that we can try out our next attempt to run Linux on Microsoft hardware a little earlier in the game.
Special thanks to BMMods for providing us review samples for this article.