Cloud = x86 and open source

From a high-level perspective, the basic architecture of Facebook is not that different from other high performance web services.

However, Facebook is the poster child of the new generation of Cloud applications. It's hugely popular and very interactive, and as such it requires much more scalability and availability than your average website that mostly serves up information.

The "Cloud Application" generation did not turn to the classic high-end redundant platforms with heavy Relational Database Management Systems. A combination of x86 scale-out clusters, open source websoftware, and "no SQL" is the foundation that Facebook, Twitter, Google and others build upon.

However, facebook has improved several pieces of the Open Source software puzzle to make them more suited for extreme scalability. Facebook chose PHP as its presentation layer as it is simple to learn, write, and read. However, PHP is very CPU and memory intensive.

According to Facebook’s own numbers, PHP is about 39 times slower than C++ code. Thus it was clear that Facebook had to solve this problem first. The traditional approach is to rewrite the most performance critical parts in C++ as PHP Extensions, but Facebook tried a different solution: the engineers developed HipHop, a source code transformer. Hiphop transforms the PHP source code into faster C++ code and compiles it with g++.

The next piece in the Facebook puzzle is Memcached. Memcached is an in-RAM object caching system with some very cool features. Memcached is a distributed caching system, which means a memcached cache can span many servers. The "cache" is thus in fact a collection of smaller caches. It basically recuperates unused RAM that your operating system would probably waste on less efficient file system caching. These “cache nodes” do not sync or broadcast and as a result the memory cache is very scalable.

Facebook quickly became the world's largest user of memcached and improved memcached vastly. They ported it to 64-bit, lowered TCP memory usage, distributed network processing over multiple cores (instead of one), and so on. Facebook mostly uses memcached to alleviate database load.

Facebook Technology Overview The Facebook Open Compute Servers
POST A COMMENT

62 Comments

View All Comments

  • ezekiel68 - Thursday, November 03, 2011 - link

    I was pretty sure it was a mistake and I only mentioned it to have the blemish removed - I've been following and admiring your technical writing since the the early 2000s. Please keep on bringing us great server architecture pieces. Don't worry about Jarred, he's fine too. We all make mistakes. Reply
  • Dug - Thursday, November 03, 2011 - link

    I'm curious what the cost would be on the servers compared to something like the HP. Reply
  • Lucian Armasu - Thursday, November 03, 2011 - link

    According to SemiAccurate, Facebook is considering Calxeda's recently announced ARM servers, too. It could be a lot more efficient to run something like Facebook on those types of servers. Reply
  • JohanAnandtech - Thursday, November 03, 2011 - link

    I personally doubt that very much. The memcached servers are hardly CPU intensive, but a 32 bit ARM processor will not fit the bill. Even when ARM will get 64 bit, it is safe to say that x86 will offer much more DIMM slots. It remains to be seen how the ratio Watt/ RAM cache will be. Until 64 bit ARMs arrive with quite a few memory channels: no go IMHO.

    And the processing intensive parts of the facebook architecture are going to be very slow on the ARMs.

    The funny thing about the ARM presentations is that they assume that virtualization does not exist in the x86 world. A 24 thread x86 CPU with 128 GB can maybe run 30-60 VMs on it, lowering the cost to something like 5-10W per VM. A 5W ARM server is probably not even capable of running one of those machines at a decent speed. You'll be faced with serious management overhead to deal with 30x more servers (or worse!), high response times (single thread performance take a huge dive!) just to save a bit on the power bill.

    As a general rule: if the Atom based servers have not made it to the shortlist yet, they sure are not going to replace it by ARM based ones.
    Reply
  • tspacie - Thursday, November 03, 2011 - link

    The FaceBook servers take a higher line voltage for increased efficiency. What voltage was supplied to the HP server for these tests? Reply
  • JohanAnandtech - Thursday, November 03, 2011 - link

    Both servers used 230V. I have added this to benchmark page (Thanks, good question). So in reality the Facebook server can consume slightly less. Reply
  • Alex_Haddock - Thursday, November 03, 2011 - link

    TBH we'd position SL class servers for this kind of scenario rather than DL380G7 (which does have a DC power option btw) so not sure it is a relevant comparison. Though I understand using what is available to test. Reply
  • mrsprewell - Thursday, November 03, 2011 - link

    This review claims this facebook server is more efficient than Hp's, but I see no prove. They only compares the power supply power factor performance. But what about efficiency? I guess the lab has no 277Vac input(which most datecenter don't have as well) and they can only power the server in 208/230Vac. As a result, they can't compare the servers efficiency. Also they didn't describe at what loading condition is the test being done on... I am sure the HP server has better efficiency than the facebook one at 230Vac input. The only good thing about the facebook one is that it might not need a UPS. But the consequence to that is, you have to use the battery rack from Facebook, which is not standard and can be costly.

    Also it is nice to know that the Powerone power supply will overheat when using DC input for more than 10min....hahahahh...that's a smart way to cost down the power supply...
    Reply
  • marc1000 - Thursday, November 03, 2011 - link

    what does this "noSQL" means??? they don't use any relational database at all? how facebook stores information? plain files? Reply
  • erple2 - Thursday, November 03, 2011 - link

    Google doesn't use relational databases to store and retrieve its information either. Neither does the high performance data warehouse that was developed on a program I worked on a few years ago - we migrated away from Oracle for cost and performance reasons.

    I think that the days of the Relational Database are numbered. The mainstay of the Relational Database (stored procedures) are quickly showing their age in a complete inability to debug issues with them outside of expensive specialized tools. We've been replacing them as much as we can with an abstraction layer.

    But we still have goofy constructs to deal with (joins just don't make sense from a OO perspective).

    I think that Relational Databases's days are numbered.
    Reply

Log in

Don't have an account? Sign up now