POST A COMMENT

38 Comments

Back to Article

  • blosphere - Wednesday, November 24, 2010 - link

    Oh my cable arms on the first page pic :(

    And about the consolidation, you don't want to do it that way. The proper way is to have two 1-port 10g cards or if you're counting every dollar, one 2-port card. Then you set the production traffic to active/standby config (different vlans of course) and when configuring the vmotion/vkernel port you go and override the port failover order to reverse the port priority from the production traffic (own vlans of course).

    This way you utilise both ports on the cards and you have mediocre HA (not that vmware should be called a HA system in the first place) since the production would failover to the vmotion/vkernel port and vice versa.

    All this stuff is in the vmware/cisco whitepaper. Deployed already a few years ago to our datacentres worldwide, around 100 esxi hosts and 3000+ vm guests, works like charm when things start going wrong. Of course vmware itself does cause some problems in a port loss situation but that's a different story.
    Reply
  • mino - Wednesday, November 24, 2010 - link

    Agreed, Agreed and again Agreed :). Reply
  • Dadofamunky - Thursday, November 25, 2010 - link

    Two thumbs up for this. Reply
  • DukeN - Wednesday, November 24, 2010 - link

    And what type of switch would actually have the switching capacity to push this type of traffic through in a dedicated manner? That is a cost to be considered.

    That being said, I think well priced FC might still be better from a CPU usage standpoint.
    Reply
  • mino - Wednesday, November 24, 2010 - link

    FC is better at everything! Problem being, it is a "bit" more expensive.

    So for an SMB or storage IO light apps? 10G all the way.

    For an enterprise database stuff? Think about it very thouroughly before commiting to 10G. And even then,you better forget about iSCSI.

    Consolidating everything-ethernet info 2*10G ? Great. Just do it!
    But do not forget to get security boys on-board before making a proposal to your CIO :D
    No, even Nexus 1000V would not help you ex-post ...
    Reply
  • Inspector2211 - Wednesday, November 24, 2010 - link

    Myricom was one of the 10G pioneers and now has a 2nd generation lineup of 10G NICs, with any phsyical connection option you can imagine (thick copper, thin copper, long range fiber, short range fiber).

    I picked up a pair of new first-gen Myricom NICs on eBay for $200 each and will conduct my own performance measurements soon (Linux box to Linux box).
    Reply
  • iamkyle - Wednesday, November 24, 2010 - link

    Last I checked, Myricom has no 10G over CAT5e/6 UTP product available. Reply
  • mianmian - Wednesday, November 24, 2010 - link

    I guess the lightpeak products May first hit the 10G Ethernet market. it will greatly reduce the cost&energy for those servers. Reply
  • mino - Wednesday, November 24, 2010 - link

    First:
    There is not mentioned in the article what kind of setup you are simulating.
    Surely the network(HTTP ?) latency is not in tens of milliseconds, is it ?

    Second:
    Port consolidation? Yes, a great thing, but do not compare oranges to apples!
    There is a huge difference in consolidating those 10+ Ethernet interfaces (easy) and joining in a previously FC SAN (VERY hard to do properly).

    You are pretending that Ethernet (be it 1Gb or 10Gb) is in the performance class of even 4G FC SAN's is a BIG fail.

    10Gb Ethernet SAN (dedicated!) is a great el-cheapo data streaming solution.
    Rather try not hitting that with a write-through database.

    If your 4G SAN utilization is in the <10% range and you have no storage-heavy apps, FCoE or even iSCSI is a very cost-effective proposition.
    Yet even then it is prudent to go for a 2*10G + 2*10G arrangement of SAN + everything else.

    I have yet to see a shaper who does not kill latency ...

    Provided no test description was given, one has to assume you got ~4x the latency when shaping as well.

    The article on itself was enlightening so keep up the good work!

    Please, try not thinking purely SMB terms. There are MANY apps which would suffer tremendously going from FC latency to Ethernet latency.

    FYI, One unnamed storage virtualization vendor has FC I/O operation pass-through-virtualization-box capability of well under 150us.
    That same vendor has observed the best 1GbE solutions choke at <5k IOps, 10GbE at ~10k IOps while a basic 2G FC does ~20k IOps, 4G ~40k IOps and 8G up to ~70k IOps.
    Reply
  • JohanAnandtech - Thursday, November 25, 2010 - link

    I agree with you that consolidating storage en network traffic should not be done on heavy transaction databases that already require 50% of your 10 GbE pipe.

    However, this claim is a bit weird:

    "That same vendor has observed the best 1GbE solutions choke at <5k IOps, 10GbE at ~10k IOps while a basic 2G FC does ~20k IOps, 4G ~40k IOps and 8G up to ~70k IOps."

    Let us assume that the average block size is 16 KB. That is 5000x16 KB or 80 MB/s for the 1 G solution. I can perfectly live with that claim, it seems very close to what we measure. However, claiming that 10G ethernet can only do twice as much seems to indicate that the 10G solution was badly configured.

    I agree that the latency of FC is quite a bit lower. But let us put this perspective: those FC HBA have been communicating with disk arrays that have several (in some cases >10) ms of latency in case of write-through database. So 150us or 600us latency in the HBA + cabling is not going to make the difference IMHO.

    To illustrate my point: the latency of our mixed test (Iometer/IxChariot) is as follows: 2.1 ms for the disktest (Iometer 64 KB sequential), 330 us for the networktest (high performance script of IxChariot). I think that is very acceptable to any application.
    Reply
  • mino - Thursday, November 25, 2010 - link

    Well the main issue with 10G, especially copper, is that the medium frequencies are nowhere near FC.
    Then there is the protocol overhead for iSCSI, FCoE is musch better in that respect though.

    That IOps measure was a reliably achievable peak - meaning generally with <4k IO operations.
    10k on Gbit can be done in the lab easily, but it was not sustainable/reliable-enough to consider it for production use.

    Those disk arrays have caches, today they even have SSD's etc. etc. then there is the dark fiber DR link one has to wait for ...

    But yes, in a typical virtualized web-serving or SMB scenario 10G makes all the sense in the world.

    All I ask is that you generally not dismiss FC without discussing its strengths.
    It is a real pain explaining to a CIO why 10G will not work out when a generally reputable site AT says it is just "better" than FC.
    Reply
  • JohanAnandtech - Thursday, November 25, 2010 - link

    I do not dismiss FC. You are right that we should add FC to the mix. I'll do my best to get this is in another article. That way we can see if the total latency (= actual response times) are really worse on iSCSI than on FC.

    Then again, it is clear that if you (could) benefit from 8 gbit/s FC now, consolidating everything into a 10 Gbit pipe is a bad idea. I really doubt 10 GbE iSCSI is worse than 4 Gb FC, but as always, the only way to check this, is to measure.
    Reply
  • gdahlm - Thursday, November 25, 2010 - link

    The main issue is that Ethernet is designed to drop packets. This means that to be safe all iSCSI writes need to be synchronous and this means you will be hitting the disks hard or you are going to risk data loss if congestion starts dropping packets or you fill your ring buffer etc..

    Even with ZFS and a SSD ZIL you will be slower then ram based WriteBack Cache.

    As an example here are some filebench oltp results from a linux based host to a zfs array over 4GB FC.

    Now this is a pretty cheap array but will show the difference between a SSD backed ZIL and using the memory in writeback mode.

    Host: noop elevator with Direct IO to SSD ZIL
    6472: 77.657: IO Summary: 486127 ops, 8099.587 ops/s, (4040/4018 r/w), 31.6mb/s, 683us cpu/op, 48.4ms latency

    Host: noop elevator with Direct IO with WB cache on the target.
    18042: 73.066: IO Summary: 767336 ops, 12778.487 ops/s, (6373/6340 r/w), 50.0mb/s, 481us cpu/op, 29.6ms latency

    Basic FC switches are cheap compared to 10gig switches at this point in time too.
    Reply
  • JohanAnandtech - Friday, November 26, 2010 - link

    "The main issue is that Ethernet is designed to drop packets. This means that to be safe all iSCSI writes need to be synchronous"

    IMHO, this only means that congestion is a worse thing for iSCSI (so you need a bit more headroom). Why would writes not be async? Once you hit the cache of your iSCSI target, the packets are in the cache, Ethernet is not involved anymore. So a smart controller can perform writes async. As matt showed with his ZFS article.

    "Even with ZFS and a SSD ZIL you will be slower then ram based WriteBack Cache."

    Why would write back cache not be possible with iSCSI?

    "Basic FC switches are cheap compared to 10gig switches at this point in time too. "

    We just bought a 10GbE switch from Dell: about $8000 for 24 ports. Lots of 24 port FC Switches are quite a bit more expensive. It is only a matter of time before 10GbE switches are much cheaper.

    Also, with VLAN, the 10GbE can be used for both network as SAN traffic.
    Reply
  • gdahlm - Friday, November 26, 2010 - link

    I may be missing the part where Matt talked about async on ZFS. I only see where he was discussing using SSDs for an external ZIL. However writing to the ZIL is not an async write, it is fully committed to disk even if that commit is happening on an external log device.

    There are several vendors who do support async iSCSI writes using battery backed cache etc.. But to move up into the 10G performance level puts them at a price point where the costs of switches is fairly trivial.

    iSCSI is obviously is a route-able protocol and thus often it is not just traversing one top of rack switch. Due to this lots of targets and initiators tend to be configured a conservative manor. COMSTAR is one such product, all COMSTAR iSCSI writes are synchronous, thus the reason you gain any advantage from an external ZIL. It appears that the developers assumed that FC storage is “reliable” and thus by default (at least in sol 11 express) zvols that are exported through COMSTAR are configured as writeback by default. You need to actually use stmfadm to specifically enable the honoring of sync writes and thus the use of the ZIL on pool or ssd.

    I do agree that 10Gig Ethernet will be cheaper soon. I do not agree that it is cheaper at the moment.

    Dell does have a 10GbE switch for about 8K but that is without the SFP modules. Qlogic has a 20 port 8GB switch that can be purchased for about 9K with the SFP modules.

    If you have a need for policy on your network side or other advanced features the cost per port for 10GbE goes up dramatically.

    I do fully expect that this cost difference will change dramatically over the next 12 months.

    Ideally had SUN not been sold when they were we would probably have a usable SAS target today, but it appears that work on the driver has stopped.

    This would have enabled the use of LSI's SAS switches which are dirt cheap to provide full 4 lane 6Gbs connectivity to hosts through truly inexpensive SAS HBAs.
    Reply
  • Photubias - Friday, November 26, 2010 - link

    quote "That same vendor has observed the best 1GbE solutions choke at <5k IOps..."

    This guy achieves 20k IOPS through 1GbE: http://www.anandtech.com/show/3963/zfs-building-te... ?
    Reply
  • blowfish - Wednesday, November 24, 2010 - link

    Trying to read this article is making my brain hurt! ;(

    It surprises me to see what looks like pci connectors on the NIC;s though. Are servers slower to adopt new interfaces?
    Reply
  • Alroys - Wednesday, November 24, 2010 - link

    They are not PCI, they actually are PCIe x8. Reply
  • blowfish - Thursday, November 25, 2010 - link

    oh, thanks for the clarification! Reply
  • blandead - Wednesday, November 24, 2010 - link

    On that note, does anyone know how to combine ports to aggregate with an extreme switch rather than a fail-over solution like as mentioned above.. combining 1GbE ports. Just a general idea of commands will point me in right direction : )
    Much appreciated if anyone replies!
    Reply
  • fr500 - Wednesday, November 24, 2010 - link

    I guess there is LACP or PAGP and some propietary solution.

    A quick google told me it's called cross-module trunking.
    Reply
  • mlambert - Wednesday, November 24, 2010 - link

    FCoE, iSCSI (*not that you would, but you could), FC, and IP all across the same link. Cisco offers VCP LACP with CNA as well. 2 links per server, 2 links per storage controller, thats not many cables. Reply
  • mlambert - Wednesday, November 24, 2010 - link

    I meant VPC and Cisco is the only one that offers it today. I'm sure Brocade will in the near future. Reply
  • Zok - Friday, November 26, 2010 - link

    Brocade's been doing this for a while with the Brocade 8000 (similar to the Nexus 5000), but their new new VDX series takes it a step further for FCoE. Reply
  • Havor - Wednesday, November 24, 2010 - link

    Do these network adapters are real nice for servers, don't need a manged NIC, i just really want affordable 10Gbit over UTP ore STP.

    Even if its only 30~40M / 100ft because just like whit 100Mbit network in the old days my HDs are more then a little out preforming my network.

    Wondering when 10Gbit will become common on mobos.
    Reply
  • Krobar - Thursday, November 25, 2010 - link

    Hi Johan,

    Wanted to say nice article first of all, you pretty much make the IT/Pro section what it is.

    In the descriptions of the cards and conclusion you didnt mention Solarflares "Legacy" Xen netfront support. This only works for paravirt Linux VMs and requires a couple of extra options at kernal compile time but it run like a train and requires no special hardware support from the motherboard at all. None of the other brands support this.
    Reply
  • marraco - Thursday, November 25, 2010 - link

    I once made a resume of total cost of the network on the building where I work.

    Total cost of network cables was far larger than the cost of the equipment (at least with my country prices). Also, solving any cable related problem was a complete hell. The cables were hundreds, all entangled over the false roof.

    I would happily replace all that for 2 of tree cables with cheap switches at the end. Selling the cables would pay for new equipment and even give a profit.

    Each computer has his own cable to the central switch. A crazy design.
    Reply
  • mino - Thursday, November 25, 2010 - link

    IF you go 10G for cable consolidation, you better forget about cheap switches.

    The real saving are in the manpower, not the cables themselves.
    Reply
  • myxiplx - Thursday, November 25, 2010 - link

    If you're using a Supermicro Twin2, why don't you use the option for the on board Mellanox ConnectX-2? Supermicro have informed me that with a firmware update these will act as 10G Ethernet cards, and Mellanox's 10G Ethernet range has full support for SR-IOV:

    Main product page:
    http://www.mellanox.com/content/pages.php?pg=produ...

    Native support in XenServer 5:
    http://www.mellanox.com/content/pages.php?pg=produ...
    Reply
  • AeroWB - Thursday, November 25, 2010 - link

    Nice Article,

    It is great to see more test around virtual environments. What surprises me a little bit is that at the start of the article you say that ESXi and Hyper-V do not support SR-IOV yet. So I was kind of expecting a test with Citrix Xenserver to show the advantages of that. Unfortunately it's not there. I hope you can do that in the near future.
    I work with both Vmware ESX and Citrix XenServer we have a live setup of both. We started with ESX and later added a XenServer system, but as XenServer is getting more mature and gets more and more features we probably replace the ESX setup with XenServer (as it is much much cheaper) when maintenance runs out in about one year so I'm really interested in tests on that platform.
    Reply
  • Kahlow - Friday, November 26, 2010 - link

    Great article! The argument between fiber and 10gig E is interesting but from what I have seen it is extremely application and workload dependant that you would have to have a 100 page review to be able to figure out what media is better for what workload.
    Also, in most cases your disk arrays are the real bottleneck and max’ing your 10gig E or your FC isn’t the issue.

    It is good to have a reference point though and to see what 10gig translates to under testing.

    Thanks for the review,
    Reply
  • JohanAnandtech - Friday, November 26, 2010 - link

    Thanks.

    I agree that it highly depends on the workload. However, there are lots and lots of smaller setups out there that are now using unnecessarily complicated and expensive setups (several physical separated GbE and FC). One of objective was to show that there is an alternative. As many readers have confirmed, a dual 10GbE can be a great solution if your not running some massive databases.
    Reply
  • pablo906 - Friday, November 26, 2010 - link

    It's free and you can get it up and running in no time. It's gaining a tremendous amount of users because of the recent Virtual Desktop licensing program Citrix pushed. You could double your XenApp (MetaFrame Presentation Server) license count and upgrade them to XenDesktop for a very low price, cheaper than buying additonal XenApp licenses. I know of at least 10 very large organizations that are testing XenDesktop and preparing rollouts right now.

    What gives. VMWare is not the only Hypervisor out there.
    Reply
  • wilber67 - Sunday, November 28, 2010 - link

    Am I missing something in some of the comments?
    Many are discussing FCoE and I do not believe any of the NICs tested were CNAs, just 10GE NICs.
    FCoE requires a CNA (Converged Network Adapter). Also, you cannot connect them to a garden variety 10GE switch and use FCoE. . And, don't forget that you cannot route FCoE.
    Reply
  • gdahlm - Sunday, November 28, 2010 - link

    You can use software initiators on switches which support 802.3X flow control. Many web managed switches do support 802.3X as do most 10GE adapters.

    I am unsure how that would effect performance at in a virtualized shared environment as I believe it pauses on the port level.

    If you workload is not storage or network bound it would work but I am betting that when you hit that hard knee in your performance curve that things get ugly pretty quick.
    Reply
  • DyCeLL - Sunday, December 05, 2010 - link

    To bad HP virtual connect couldn't be tested (a blade option).
    It splits the 10GB nics in a max of 8 Nics for the blades. It can do it for fiber and ethernet.
    Check: http://h18004.www1.hp.com/products/blades/virtualc...
    Reply
  • James5mith - Friday, February 18, 2011 - link

    I still think that 40Gbps Infiniband is the best solution. By far it seems to be the best $/Gbps ratio of any of the platforms. Not to mention it can pass pretty much any traffic type you want. Reply
  • saah - Thursday, March 24, 2011 - link

    I loved the article.

    I just reminded myself that VMware published official drivers for the ESX4 recently: http://downloads.vmware.com/d/details/esx4x_intel_...
    The ixgbe version is 3.1.17.1.
    Since the post says that "enables support for products based on the Intel 82598 and 82599 10 Gigabit Ethernet Controllers." I would like to see the test redone with an 82599-based card and recent drivers.
    Would it be feasible?
    Reply

Log in

Don't have an account? Sign up now