Facebook Technology Overview

Facebook had 22 Million active users in the middle of 2007; fast forward to 2011 and the site now has 800 Million active users, with 400 million of them logging in every day. Facebook has grown exponentially, to say the least! To cope with this kind of exceptional growth and at the same time offer a reliable and cost effective service requires out of the box thinking. Typical high-end, brute force, ultra redundant software and hardware platforms (for example Oracle RAC databases running on top of a few IBM Power 795 systems) won’t do as they're too complicated, power hungry, and most importantly far too expensive for such extreme scaling.

Facebook first focused on thoroughly optimizing their software architecture, which we will cover briefly. The next step was the engineers at Facebook deciding to build their own servers to minimize the power and cost of their server infrastructure. Facebook Engineering then open sourced these designs to the community; you can download the specifications and mechanical CAD designs at the Open Compute site.

The Facebook Open Compute server design is ambitious: “The result is a data center full of vanity free servers which is 38% more efficient and 24% less expensive to build and run than other state-of-the-art data centers.” Even better is that Facebook Engineering sent two of these Open Compute servers to our lab for testing, allowing us to see how these servers compare to other solutions in the market.

As a competing solution we have an HP DL380 G7 in the lab. Recall from our last server clash that the HP DL380 G7 was one of the most power efficient servers of 2010. Is a server "targeted at the cloud" and designed by Facebook engineering able to beat one of the best and most popular general purpose servers? That is the question we'll answer in this article.

Cloud Computing = x86 and Open Source
POST A COMMENT

62 Comments

View All Comments

  • mczak - Thursday, November 03, 2011 - link

    I think the possibility of this chip being shrunk since 2005 is 0%. The other question is if it was shrunk from rv100 or if it's actually the same - even if it was shrunk it probably was a well mature process like 130nm in 2005 otherwise it's 180nm.
    At 130nm (estimated below 20 million transistors) the die size would be very small already and probably wouldn't get any smaller due to i/o anyway. Most of the power draw might be due to i/o too so shrink wouldn't help there neither. It is possible though it's really below 1W (when idle).
    Reply
  • Taft12 - Thursday, November 03, 2011 - link

    A shrink WOULD allow production of many more units on each wafer. Since almost every server shipped needs an ES1000 chip, demand is consistently on the order of millions per year. Reply
  • mczak - Thursday, November 03, 2011 - link

    There's a limit how much i/o you can have for a given die size (actually the limiting factor is not area but circumference so making it rectangular sort of helps). i/o pads apparently don't shrink well hence if your chip has to have some size because you've got too many i/o pads a shrink will do nothing but make it more expensive (since smaller process nodes are generally more expensive per area).
    Being i/o bound is quite possible for some chips though I don't know if this one really is - it's got at least display outputs, 16bit memory interface, 32bit pci interface and the required power/ground pads at least.
    In any case even at 180nm the chip should be below 40mm² already hence the die size cost is probably quite low compared to packaging, cost of memory etc.
    Reply
  • Penti - Saturday, November 05, 2011 - link

    It's the integrated BMC/ILO solution which also includes a GPU that would use more power then the ES1000 any how. That is also what is lacking in the simple Google / Facebook compute-node setup. They don't need that kind off management and can handle that a node goes offline. Reply
  • haplo602 - Thursday, November 03, 2011 - link

    It seems to me that the HP server is doing as well as the Facebook ones. Considering it has more featuers (remote management, integrated graphics) and a "common" PSU. Reply
  • JohanAnandtech - Thursday, November 03, 2011 - link

    The HP does well. However, if you don't need integrated graphics and you hardly use the BMC at all, you still end up with a server that wastes power on features you hardly use. Reply
  • twhittet - Thursday, November 03, 2011 - link

    I would assume cost is also a major factor. Why pay for so many features you don't need? Manufacturing costs should be lower if they actually build these in bulk. Reply
  • jamdev12 - Thursday, November 03, 2011 - link

    I would definitely have to agree with you on this notion. HP servers are pretty expensive when you take into account 3 year warranties and 24/7 replacement options that going with a open compute server is a nice alternative to the "I can do everything" server. Better to stick to something you can do pretty well and efficiently than I can do many things poorly. Reply
  • haplo602 - Friday, November 04, 2011 - link

    this is an option for somebody with a custom built infrastructure and dedicated DC services. however a general purpose server CANNOT do without.

    since the server category is different (general purpose vs custom built) the HP one does well (I'd say even excelent).
    Reply
  • HollyDOL - Thursday, November 03, 2011 - link

    I would be quite interested how they determined Java and C# are 2/3x slower than C++. Since it seems pretty non-corresponding with reality to me. I have seen a few tests C++ vs. Java and the differences were in matter of %. As well as C# in my experience does the same jobs little bit faster than Java and the benchmark results generally confirm it.
    few links:

    http://blog.cfelde.com/2010/06/c-vs-java-performan...
    http://reverseblade.blogspot.com/2009/02/c-versus-...
    Reply

Log in

Don't have an account? Sign up now