POST A COMMENT

112 Comments

Back to Article

  • edchi - Tuesday, June 26, 2007 - link


    I haven't tried this yet, but will do tomorrow. Here is what Apple suggests to create a better MySQL installation:

    http://docs.info.apple.com/article.html?artnum=303...">http://docs.info.apple.com/article.html?artnum=303...
    Reply
  • grantma - Tuesday, April 18, 2006 - link

    I found Gnome was a lot more snappy than OS X desktop under Debian PowerPC. You could tell the kernel was far faster using Linux 2.6 - programs would just start immediately. Reply
  • pecosbill - Wednesday, June 15, 2005 - link

    I'm not going to waste my time searching to see if these same comments below were made already, but the summary of them is those who are performance oriented tune their code for a CPU. You can do the same for an OS. Also, the "Big Mac" cluster in VA tech speaks otherwise to raw performance as OS X was the OS of choice. From macintouch.com:

    Okay, stop, I have to make an argument about why this article fails, before I explode. MySQL has a disgusting tendency to fork() at random moments, which is bad for performance essentially everywhere but Linux. OS X server includes a version of MySQL that doesn't have this issue.
    No real arguments that Power Macs are somewhat behind the times on memory latency, but that's because they're still using PC3200 DDR1 memory from 2003. AMD/Intel chips use DDR2 or Rambus now ... this could be solved without switching CPUs.
    The article also goes out of its way to get bad results for PPC. Why are they using an old version of GCC (3.3.x has no autovectorization, much worse performance on non-x86 platforms), then a brand spanking new version of mySQL (see above)? The floating point benchmark was particularly absurd:

    "The results are quite interesting. First of all, the gcc compiler isn't very good in vectorizing. With vectorizing, we mean generating SIMD (SSE, Altivec) code. From the numbers, it seems like gcc was only capable of using Altivec in one test, the third one. In this test, the G5 really shows superiority compared to the Opteron and especially the Xeons"

    In fact, gcc 3.3 is unable to generate AltiVec code ANYWHERE, except on x86 where they added a special SSE mode because x87 floating point is so miserable. This could have been discovered with about 5 minutes of Google research. It wouldn't had to have been discovered at all if they hadn't gone out of their way to use a compiler which is the non-default on OS X 10.4. Alarm bells should have been going off in the benchmarkers head when an AMD chips outperforms an Intel one by 3x, but, anyway ...
    I hate to seem like I'm just blindly defending Apple here, but this article seems to have been written with an agenda. There's no way one guy could stuff this much stuff up. To claim there's something inherently wrong with OS X's ability to be a server is going against so much publicly available information it's not even funny. Notice Apple seems to have no trouble getting Apache to run with Linux-like performance: [Xserve G5 Performance].
    Anyway ... on a more serious note, a switch of sorts to x86 may not be a hugely insane idea. IBM's ability to produce a low power G5 part seems to be seriously in question, so for PowerBooks Apple is pretty much running out of options. Worse comes to worst - if they started selling x86-powered portables, that might get IBM to work a bit harder to get them faster desktop chips.
    -- "A Macintosh MPEG software developer"
    Reply
  • aladdin0tw - Tuesday, June 14, 2005 - link

    This is my first time to see someone use 'ab' command to conduct a test, and trying to tell us something from the test.

    In my opinion, ab is never a 'stress test' tool for any reason, especially when you want to conclude some creditable benchmark from this test. If we can accept 'ab', why I have to code so much for a stress test?

    The 'localhost' is another problematic area, DNS. Why not using a fixed ip as an address? The first rule of benchmaking is isolated the domain in question, but I can not see you obey these rule. So how can you interpret your result as a performance faulty, not a dns related problem?

    I think you should benchmark again, and try some good practices used in software industry.

    Aladdin from Taiwan
    Reply
  • demuynckr - Sunday, June 12, 2005 - link

    jhagman, the number in the apache test table means the request per second that the server handles. Reply
  • jhagman - Wednesday, June 08, 2005 - link

    Hi again, demuynckr.

    Could you please answer to me, or preferably add the information to the article. What the does the number in the apache test table mean and what kind of a page was loaded?

    I assumed that the numbers given were hits per second or transfer rate. I've been testing a bit on my powerbook (although with a lower n) and I can very easily beat the numbers you have. So it is apparent that my assumption was wrong.

    BTW, gcc-3.3 on Tiger knows the switch -mcpu=G5
    Reply
  • rubikcube - Wednesday, June 08, 2005 - link

    I thought I would post this set of benchmarks for os x on x86 vs. PPC. Even though XBench is a questionable benchmark, it still is capable of vindicating these questions about linux-ppc.

    http://www.macrumors.com/pages/2005/06/20050608063...
    Reply
  • webflits - Wednesday, June 08, 2005 - link

    "Yes I have read the article, I also personally compiled the microbenchmarks on linux as well as on the PPC, and I can tell you I used gcc 3.3 on Mac for all compilation needs :)."

    I believe you :)

    But why my are results I get way higher than the numbers listed in the article?
    Reply
  • mongo lloyd - Tuesday, June 07, 2005 - link

    At least the non-ECC RAM, that is. Reply
  • mongo lloyd - Tuesday, June 07, 2005 - link

    Any reason for why you weren't using RAM with lower timings on the x86 processors? Shouldn't there at least have been a disclaimer? Reply
  • jhagman - Tuesday, June 07, 2005 - link

    OK, this clears it up, thanks.

    One little thing still, what is the number you are giving in the ab results table? Is it requests per second or perhaps the transfer rate?

    Reply
  • demuynckr - Tuesday, June 07, 2005 - link

    jhagman:
    As i mentioned before, we used gcc 3.3.3 for all linux, and gcc 3.3 mac compiler on apple, because that was the standard one.
    I did a second flops test with the gcc 4.0 compiler included on the Tiger cd, and the flops are much better when compiled with the -mcpu=g5 option which did not seem available when using the gcc 3.3 Apple compiler.
    As for ab i used these settings,
    ab -n 100000 -n x http://localhost/

    x for the various concurrencies: 5,20,50,100,150.
    Reply
  • spinportal - Monday, June 06, 2005 - link

    Guess there's no one arguing that the PPC is not keeping its paces with the current market, but rather OS/X able to do Big Iron computing. And if rumors be true, where will you be able to get a PPC built once Apple drops IBM for Intel?
    In a Usenet debate in 93, Torvalds and Tannenbaum go roasting Mach microkernel vs. the death of Linux. Seems Linus' work will be seeing more light of day, and Mach go the way of the dodo. Will Apple rewrite OS/X for Intel x86/64? As far as practical business sense, that's like shooting off one's leg foot.
    Reply
  • spinportal - Monday, June 06, 2005 - link

    Reply
  • jhagman - Monday, June 06, 2005 - link

    Could you please give the exact method of testing apache with ab? It is really hard to try to redo the tests when one does not know which methodology was used. The amount of clients and switches of ab would be appreciated.

    Also an answer to why Apple's newest gcc (4.0) was not used would be an interesting one and did you _really_ use gcc 3.3.3 and not Apple's gcc?

    Other than these omissions I found the article very interesting, thanks.
    Reply
  • demuynckr - Monday, June 06, 2005 - link

    Yes I have read the article, I also personally compiled the microbenchmarks on linux as well as on the PPC, and I can tell you I used gcc 3.3 on Mac for all compilation needs :). Reply
  • webflits - Monday, June 06, 2005 - link

    demuynckr, did your read the article?

    "So, before we start with application benchmarks, we performed a few micro benchmarks compiled on all platforms with the SAME gcc 3.3.3 compiler. "


    BTW I ran the same tests using Apple's version of gcc 3.3
    As you can see my 2.0Ghz now beats the 2.5Ghz on 5 of the 8 tests, and a 2.7Ghz G5 would be on par with the Opteron 250 when you extrapolate the results.

    Lets face it, Anandtech screwed up by using a crippled compiler for the G5 tests


    ----------------------------
    GCC 3.3/OSX 10.4.1/2.0GHz G5

    FLOPS C Program (Double Precision), V2.0 18 Dec 1992

    Module Error RunTime MFLOPS
    (usec)
    1 4.0146e-13 0.0140 997.2971
    2 -1.4166e-13 0.0108 648.4622
    3 4.7184e-14 0.0089 1918.5122
    4 -1.2546e-13 0.0139 1076.8597
    5 -1.3800e-13 0.0312 928.9079
    6 3.2374e-13 0.0182 1596.1407
    7 -8.4583e-11 0.0348 344.3954
    8 3.4855e-13 0.0196 1527.6638

    Iterations = 512000000
    NullTime (usec) = 0.0004
    MFLOPS(1) = 827.5658
    MFLOPS(2) = 673.7847
    MFLOPS(3) = 1037.6825
    MFLOPS(4) = 1501.7226
    Reply
  • demuynckr - Monday, June 06, 2005 - link

    Just to clear things up: on linux the gcc 3.3.3 was used, on macintosh gcc 3.3 was used (the one that was included with the OS).
    Reply
  • Joepublic2 - Monday, June 06, 2005 - link

    Wow, pixelglow, that's an awesome way to advertise your product. No marketing BS, just numbers! Reply
  • pixelglow - Sunday, June 05, 2005 - link

    I've done a direct comparison of G5 vs. Pentium 4 here. The benchmark is cache-bound, minimal branching, maximal floating point and designed to minimize use of the underlying operating system. It is also single-threaded so there's no significant advantage to dual procs. More importantly it uses Altivec on G5 and SSE/SSE2 on the Pentium 4, and also compares against different compilers including the autovectorizing Intel ICC.

    http://www.pixelglow.com/stories/macstl-intel-auto...
    http://www.pixelglow.com/stories/pentium-vs-g5/

    Let the results speak for themselves.
    Reply
  • webflits - Sunday, June 05, 2005 - link

    "From the numbers, it seems like gcc was only capable of using Altivec in one test,"

    Nonsense!
    The Altivec SIMD only supports single (32-bit) precision floating point and the benchmark uses double precision floating point.


    Reply
  • webflits - Sunday, June 05, 2005 - link

    Here are the resuls on a dual 2.0Ghz G5 running 10.4.1 using the stock Apple gcc 4.0 compiler.



    [Session started at 2005-06-05 22:47:52 +0200.]

    FLOPS C Program (Double Precision), V2.0 18 Dec 1992

    Module Error RunTime MFLOPS
    (usec)
    1 4.0146e-13 0.0163 859.4752
    2 -1.4166e-13 0.0156 450.0935
    3 4.7184e-14 0.0075 2264.2656
    4 -1.2546e-13 0.0130 1152.8620
    5 -1.3800e-13 0.0276 1051.5730
    6 3.2374e-13 0.0180 1609.4871
    7 -8.4583e-11 0.0296 405.4409
    8 3.4855e-13 0.0200 1498.4641

    Iterations = 512000000
    NullTime (usec) = 0.0015
    MFLOPS(1) = 609.8307
    MFLOPS(2) = 756.9962
    MFLOPS(3) = 1105.8774
    MFLOPS(4) = 1554.0224
    Reply
  • frfr - Sunday, June 05, 2005 - link

    If you test a database you have to disable the write cache on the disk on almost any OS unless you don't care about your data. I've read that OS X is an exception because it allows the database software control over it, and that mySql indeed does use this. This would invalidate al your mySql results except for OS X.

    Besides all serious database's run on controllers with write cache with batteries (and with the write cache on the disks disabled).

    Reply
  • nicksay - Sunday, June 05, 2005 - link

    It is pretty clear that there are a lot of people who want Linux PPC benchmarks. I agree. I also think that if this is to be a "where should I position the G5/Mac OS X combination compared to x86/Linux/Windows" article, you should at least use the default OS X compiler. I got flops.c from http://home.iae.nl/users/mhx/flops.c to do my own test. I have a stock 10.4.1 install on a single 1.6 GHz G5.

    In the terminal, I ran:
    gcc -DUNIX -fast flops.c -o flops

    My results:

    FLOPS C Program (Double Precision), V2.0 18 Dec 1992

    Module Error RunTime MFLOPS
    (usec)
    1 4.0146e-13 0.0228 614.4905
    2 -1.4166e-13 0.0124 565.3013
    3 4.7184e-14 0.0087 1952.5703
    4 -1.2546e-13 0.0135 1109.5877
    5 -1.3800e-13 0.0383 757.4925
    6 3.2374e-13 0.0220 1320.3769
    7 -8.4583e-11 0.0393 305.1391
    8 3.4855e-13 0.0238 1258.5012

    Iterations = 512000000
    NullTime (usec) = 0.0002
    MFLOPS(1) = 736.3316
    MFLOPS(2) = 578.9129
    MFLOPS(3) = 866.8806
    MFLOPS(4) = 1337.7177


    A quick add-n-divide gives my system an average result of 985.43243.

    985. On a single 1.6 G5.

    So, the oldest, slowest PowerMac G5 ever made almost matches a top-of-the-line dual 2.7 G5 system?

    To quote, "Something is rotten in the state of Denmark." Or should I say the state of the benchmark?
    Reply
  • Eug - Saturday, June 04, 2005 - link

    BTW, about the link I posted above:

    http://lists.apple.com/archives/darwin-dev/2005/Fe...

    The guy who wrote that is the creator of the BeOS file system (and who now works for Apple).

    It will be interesting to see if this is truly part of the cause of the performance issues.

    Also, there is this related thread from a few weeks back on Slashdot:

    http://hardware.slashdot.org/article.pl?sid=05/05/...
    Reply
  • profchaos - Saturday, June 04, 2005 - link

    The statement about Linux kernel modules is incorrect. It is a popular misconception that kernel modules make the Linux kernel something other than purely monolithic. The module loader links module code in kernelspace, not in userspace, the advantage being dynamic control of kernel memory footprint. Although some previously kernelspace subsystems, such as devfs, have been recently rewritten as userspace daemons, such as udev, the Linux kernel is for the most part a fully monolithic design. The theories that fueled the monolithic vs. microkernel flame wars of the mid-90s were nullified by the rapid ramping of single-thread performance relative to memory subsystems. From the perspective of the CPU, it take years for a context switch to occur since modifying kernel data structures in main memory is so slow relative to anything else. Userspace context switching is based on IPC in microkernel designs, and may require several context switches in practice. As you can see from the results, Linux 2.6 wipes the floor with Darwin just the same as it does with several of the BSDs (especially OpenBSD and FreeBSD4.x) and its older cousin Linux 2.4. It's also anyone's guess whether the Linux 2.6 systems were using pthreads (from NPTL) or linuxthreads in glibc. It takes a heavyweight UNIX server system, which today means IBM AIX on POWER, HP-UX on Itanium, or to a lesser degree Solaris on SPARC, to best Linux 2.6 under most server workloads. Reply
  • Eug - Saturday, June 04, 2005 - link

    Responses/Musings from an Apple developer.

    http://ridiculousfish.com/blog/?p=17
    http://lists.apple.com/archives/darwin-dev/2005/Fe...

    Also:

    They claim that making a new thread is called "forking". No, it’s not. Calling fork() is forking, and fork() makes processes, not threads.

    They claim that Mac OS X is slower at making threads by benchmarking fork() and exec(). I don’t follow this train of thought at all. Making a new process is substantially different from making a new thread, less so on Linux, but very much so on OS X. And, as you can see from their screenshot, there is one mySQL process with 60 threads; neither fork() nor exec() is being called here.

    They claim that OS X does not use kernel threads to implement user threads. But of course it does - see for yourself.
    /* Create the Mach thread for this thread */
    PTHREAD_MACH_CALL(thread_create(mach_task_self(), &kernel_thread), kern_res);

    They claim that OS X has to go through "extra layers" and "several threading wrappers" to create a thread. But anyone can see in that source file that a pthread maps pretty directly to a Mach thread, so I’m clueless as to what "extra layers" they’re talking about.

    They guess a lot about the important performance factors, but they never actually profile mySQL. Why not?
    Reply
  • orenb - Saturday, June 04, 2005 - link

    Thank you for a very interesting article. A follow up on desktop and workstation performance will be very appreciated... :-)

    Good job!
    Reply
  • seanp789 - Saturday, June 04, 2005 - link

    well thats great and all but yours news says apple is switchign to intel so i dont think much will be changing in the power pc lineup Reply
  • Brazilian Joe - Saturday, June 04, 2005 - link

    I would like to see this Article re-done, with more benches to give a clearer picture. I think MACOS X should be pitched against Darwin in the PPC platform, since there may be hidden differences. Darwin works on x86 too (and x86_64?), it would be very interesting to see the SAME OS under the Mac Platform running on different hardware. And having the software compiled with the same compiler present on Darwin, we should get a more consistent result. Linux and BSD should not be ditched, however. The perfornance difference Of linux/FreeBSD/OpenBSD in PPC vs PC is also a very interesting subject to investigate.
    I think this article, along with all the complaints of inconsistency in the results, sohuld fuel a new series of articles: One, Just comparing Darwin/MacOS X on Both platforms. Another For Linux, using a GCC version as close as possible to that used on Darwin. Another for FreeBSD, and yet another for OpenBSD. The last article Would get everything and summarize. I think this would be much more complete and satisfying/informative for the reader crowd.
    Reply
  • iljitsch - Saturday, June 04, 2005 - link

    There seems to be considerable confusion between threads and processes in the review. I have no trouble believing that MacOS doesn't do so well with process gymnastics, but considering the way Apple itself leverages threads, I would assume those perform much better.

    I don't understand why Apache 1.3 was used here, Apache 2.0 has much better multiprocessor capabilities and would have allowed to test the difference between the request-are-handled-in-processes and requests-are-handled-in-threads ways of doing things.
    Reply
  • Phil - Saturday, June 04, 2005 - link

    #79: Wow. I had no idea that they were actually going to do it, I had assumed that it was typical industry nonsense!
    If this is true, then IMHO Apple won't be in much of a better position (with regards to this article) as they'll still need to work on the OS, regardless.

    Can anyone speculate as to why they *really* want to switch to x86/Intel? I wonder if they'll consider AMD too...
    Reply
  • rorsten - Saturday, June 04, 2005 - link

    Uhm, the estimation for power consumption is completely wrong. The only significant CMOS power consumption - especially for an SOI chip - is the current required to charge or discharge the gates of the FETs, which only happens when a value changes (the clock accounts for most of the power consumption on a modern synchronous chip). Since we're talking about current only, this is purely resistive power, I^2R style, and since the current is related to the number of transitions per second, increasing the clock rate linearly increases the current which quadratically increases the power consumption. Reply
  • kamper - Saturday, June 04, 2005 - link

    Here's another story about Apple and Intel from cnet:
    http://news.com.com/Apple+to+ditch+IBM%2C+switch+t...

    Interesting in the context of this article but I won't believe it without much more substantial proof :)

    +1 on getting a db test using the same os on all architectures whether it be linux or bsd

    +1 on fixing the table so that it renders in firefox
    Reply
  • shanep - Saturday, June 04, 2005 - link

    Re: NetBSD.

    Sorry, I just noticed it is not supported yet by NetBSD.

    Forget I mentioned it.
    Reply
  • shanep - Saturday, June 04, 2005 - link

    "Wessonality: Our next project if we can keep the G5 long enough in the labs."

    How about testing these machines with NetBSD 2.0.2 to keep the hardware comparison on as close an equal footing as possible.

    This should mostly remove many red herrings associated with multiple differences in software across different hardware.
    Reply
  • michaelok - Saturday, June 04, 2005 - link

    "i've had for awhile about OS X server handling large numbers of thread. My OS X servers ALWAYS tank hard with lots of open sessions, so i keep them around only for emergencies. T"

    Moshe Bar (openMosix) has been an avid Mac follower for years, I see he has a few suggestions for OSX, including ditching the Mach so you can run FreeBSD natively, which has much better peformance. In fact, thread performance is one of FreeBSDs strong points, although Linux has largely caught up.

    Also research his Byte articles, you can see how a proper comparison can be done, although he does not claim to be a benchmarking expert.

    http://www.moshebar.com/tech/?q=node/5
    http://www.byte.com/documents/s=7865/byt1064782374...
    Reply
  • johannesrexx - Saturday, June 04, 2005 - link

    Everybody should use Firefox by default because it's far more secure. Use IE only when you must. Reply
  • michaelok - Saturday, June 04, 2005 - link

    "with one benchmark showing that the PowerMac is just a mediocre PC while another shows it off as a supercomputer, the unchallenged king of the personal computer world."

    Well, things are a little different when you connect, say, 32768 processors together, i.e. you go from running MySQL to Teradata, so yes, the Power architecture seems to dominate, and the Virginia Tech supercomputer is still up there, at 7th.

    http://www.top500.org/lists/plists.php?Y=2004&...

    " The RISC ISA, which is quite complex and can hardly be called "Reduced" (The R of RISC), provides 32 architectural registers"

    'Reduced Instruction Set' is misleading, it actually refers to a design philosophy of using *smaller, simpler* instructions, instead of a single complex instruction. This is to be compared with the Itanium for example, which Intel calls 'EPIC' (Explicit Parallel Instruction Computing), but it is essentially derived from VLIW (Very Long Instruction Word).

    Anyway, nice article, certainly much more to discuss here, such as SMT (Simultaneous Multithreading), (when that is available for the Apple :), vs. Intel's Hyperthreading. We'll still be comparing Apples to Oranges but isn't that why everybody buys the Motor Trend articles, i.e. '68 Mustang vs. '68 GT?


    Reply
  • psychodad - Saturday, June 04, 2005 - link

    I agree. Recently I read a review which pitted macs against pcs using software blatantly optimized for macs. If you have ever used unoptimized software, you will know it. It is slow, often unstable and not at all usable, especially if you're after productivity. Reply
  • Viditor - Friday, June 03, 2005 - link

    IntelUser2000 - "about the AMD TDP number, they never state that its max power, they say its maximum power achievable under most circumstances, its not absolute max power"

    Not true at all...AMD's datasheet clearly states that it's not only max power, but max theoretical power.
    http://www.amd.com/us-en/assets/content_type/Downl...
    Reply
  • trooper11 - Friday, June 03, 2005 - link

    I think its hard enough comparing a G5 to PC systems. I dont belive there will ever be a 'fair' comparison that satisfies everyone on both sides. There are too few general programs to compare and people will always complain about using or not using optimized apps for either platform. many of the varibles are subjective and the benchmarks to be compared are so heavily debated without a clear answer.

    I think this was a good attempt, but I gave up trying to 'fairly' compare the two a long time ago. Anyhting that sheds a bit of light is a good thing, but i never expect an end to the contreversy, too many questions that cant be answered.

    I would though love to see the addition of dual core amd chips since they are out there and would be serious competition, of course it would fly in server applications. hopefully the numbers for that could be added in a later article.
    Reply
  • psychodad - Friday, June 03, 2005 - link

    Fascinating. You run these tests using a compiler that Apple does not use (unless it is Yellow Dog) against software generally optimized for x86 architectures and you make conclusions. This makes your data tainted (actually biased) and your conclusions faulty. I would suggest that in fairness you make your tests more "real world" by using the software compiled by compilers that the rest of us nontechnical people use on a daily basis. Reply
  • smitty3268 - Friday, June 03, 2005 - link

    Rosyna:
    Oh, I assumed he was using the Apple version of gcc. If not, then I see what you mean.
    Reply
  • crimsonson - Friday, June 03, 2005 - link

    This article may be moot by Monday

    http://tinyurl.com/7ex4v
    Reply
  • Garyclaus16 - Friday, June 03, 2005 - link

    " Oh and the graph on page 5 doesnt display correctly in firefox. "

    AND you are using firefox for what reason?...you deserve to view pages incorrectly
    Reply
  • Rosyna - Friday, June 03, 2005 - link

    smitty3268, that's part of the problem. Almost no one uses GCC 3.3.3 (stock, from the main gcc branch) for Mac OS X development because it really sucks at optimizing for the PPC. On the other hand, OS X was compiled with the Apple shipped GCC 3.3/GCC 4.0. Reply
  • smitty3268 - Friday, June 03, 2005 - link

    I think its fair to use the compilers most people are going to be using. That would be gcc on both platforms. As far as autovectorization in 4.0, don't expect very much from it. Obviously it will be better than 3.3, but the real work is being added now in 4.1.

    I'll join the other 50 posters who would have liked to see at least 1 page showing the G5's performance under linux compared to OSX. That and maybe a few more real world benchmarks. But your article was very informative and answered a lot of questions. It was frustrating that there really wasn't anything done like this before.
    Reply
  • Rosyna - Friday, June 03, 2005 - link

    Actually, for better or worse the GCC Apple includes is being used for most Mac OS X software. OS X itself was compiled with it. Reply
  • elvisizer - Friday, June 03, 2005 - link

    rosyna's right.
    i'm just not sure if there IS anyway to do the kind of comparison you seem to've been shooting for (pure competition between the chips with as little else affecting the outcome as possible). you could use the 'special' compilers on each platform, but those aren't used for compiling most of the binaries you buy at compusa.
    Reply
  • elvisizer - Friday, June 03, 2005 - link

    why didn't you run some tests with YD linux on the g5?!?!?!?!?!?!? you could've answered the questions you posed yourself!!!!!
    argh.
    and you definitly should've included after effects. "we don't have access to that software" what the heck is THAT about?? you can get your hands on a dual 3.6 xeon machine, a dual 2.5 gr, and adual 2.7 g5, and you can't buy a freaking piece of adobe software at retail?!?!?!?!?!
    some seroiusly weird decisions being made here.
    other than that, the article was ok. re-confirmed suspicions i've had for awhile about OS X server handling large numbers of thread. My OS X servers ALWAYS tank hard with lots of open sessions, so i keep them around only for emergencies. They are so very easy to admin, tho, they're still attractive to me for small workgroup sizes. like last month, I had to support 8 people working on a daily magazine being published at e3. litterally inside the convention center. os x server was perfect in that situation.
    Reply
  • Rosyna - Friday, June 03, 2005 - link

    There appears to be either a typo or a horrible flaw in the test. It says you used GCC 3.3.3 but OS X comes with gcc version 3.3 20030304 (Apple Computer, Inc. build 1809).

    If you did use GCC 3.3.3 then you were giving the PPC a severe disadvantage as the stock GCC has almost no optimizations for PPC while it has many for x86.
    Reply
  • Eug - Friday, June 03, 2005 - link

    "But do you really think that Oracle would migrate to this if it wasn't on a par?"

    [Eug dons computer geek wannabe hat]

    There are lots of reasons to migrate, and I'm sure absolute performance isn't always the primary concern. We won't know the real performance until we actually see tests on Oracle/Sybase.

    My uneducated guess is that they won't be anywhere near as bad as the artifical server benches might suggest, but OTOH, I could easily see Linux on G5 significantly besting OS X on G5 for this type of stuff.

    ie. The most interesting test I'd like to see is Oracle on the G5, with both OS X and Linux, compared to Xeon and Opteron with Linux.

    And yeah, it would be interesting to see what gcc 4 brings to the table, since 3.3 provides no autovectorization at all. It would also be interesting to see how xlc/xlf does, although that doesn't provide autovectorization either. Where are the autovectorizing IBM compilers that were supposed to come out???
    Reply
  • melgross - Friday, June 03, 2005 - link

    As none of us has actual experiance with this, none of us can say yes or no.

    But do you really think that Oracle would migrate to this if it wasn't on a par? After all Ellison isn't on Apple's board anymore, so there's nothing to prove there.

    I also remember that going back to Apple's G4 XServes, their performance was better than the x86 crowd, and the Sun servers as well. Those tests were on several sites. Been a while though.
    Reply
  • JohanAnandtech - Friday, June 03, 2005 - link

    querymc: Yes, you are right. The --noaltivec flag and the comment that altivec was enabled by default in the gcc 3.3.3 compiler docs made me believe there is autovectorization (or at least "scalarisation"). As I wrote in the article we used -O2 and and then tried a bucket load of other options like --fast-math --mtune=G5 and others I don't remember anymore but it didn't make any big difference. Reply
  • querymc - Friday, June 03, 2005 - link

    The SSE support would probably also be improved by using GCC 4 with autovectorization, I should note. There's a reason it does poorly in GCC 3. :) Reply
  • querymc - Friday, June 03, 2005 - link

    Johan: I didn't see this the first time through, but you need to make a slight clarification to the floating point stuff. There is no autovectorization capability in GCC 3.3. None. There is limited support for SSE, but that is not quite the same, as SSE isn't SIMD to the extent that AltiVec is. If you want to use the AltiVec unit in otherwise unaltered benchmarks, you don't have a choice other than GCC 4 (and you need to pass a special flag to turn it on).

    Also, what compiler flags did you pass on each platform? For example, did you use --fast-math?
    Reply
  • JohanAnandtech - Friday, June 03, 2005 - link

    Melgross: Apple told me that most xserves in europe are sold as "do it all". A little webserver (apache), a database sybase, samba and so on. They didn't have any client who had heavy traffic on the webserver, so nobody complains.

    Sybase/oracle seems to have done quite a bit of work to get good performance out of Mac OS-x, so it must be interesting to see how they managed to solve those problems. But I am sceptical that Oracle/Sybase runs faster on Mac OS x than on Linux.
    Reply
  • Icehawk - Friday, June 03, 2005 - link

    Interesting stuff. I'd like to see more data too. Mmm Solaris.

    Unfortunately the diagrams weren't labeled for the most part (in terms of "higher is better") making it difficult to determine the results.

    And the whole not displaying on FF properly... come on.
    Reply
  • NetMavrik - Friday, June 03, 2005 - link

    You can say that again! NT shares a whole lot more than just similarites to VMS. There are entire structures that are copied straight from VMS. I think most people have forgotten or never knew what "NT" stood for anyway. Take VMS, increment each letter by one, and you get WNT! New Technology my a$$. Reply
  • Guspaz - Friday, June 03, 2005 - link

    Good article. But I'd like to see it re-done with the optimal compiler per-platform, and I'd like to see PowerPC Linux used to confirm that OSX is the cause of the slow MySQL performance. Reply
  • melgross - Friday, June 03, 2005 - link

    I was just thinking back about this and remembered something I've seen

    Computerworld has had articles over the past two years or so about companies who have gone to XServes. They are using them with Apache, SYbase or Oracle. I don't remember any complaints about performance.

    Also Oracle itself went to XServes for its own datacenter. Do you think they would have done that if performance was bad? They even stated that the performance was very good.

    Something here seems screwed up.
    Reply
  • brownba - Friday, June 03, 2005 - link

    johan, i always appreciate your articles.

    you've been /.'d !!!!
    and anandtech is holding up well.
    good job
    Reply
  • bostrov - Friday, June 03, 2005 - link

    Since so much effort went in to vector facilities and instruction sets ever since the P54 days, shouldn't "best effort" on each CPU be used (use the IBM compiler on G5 and the Intel compiler on x86) - by using gcc you're using an almost artifically bad compiler and there is no guarantee that gcc will provide equivilant optimizations for each platform anyway.

    I think it'd be very interesting to see an article with the very best available compilers on each platform running the benchmarks.

    Incidently, intel C with the vector instruction sets disabled still does better.
    Reply
  • JohanAnandtech - Friday, June 03, 2005 - link

    bostrov: because the Intel compiler is superb at vectorizing code. I am testing x87 FPU and gcc, you are testing SSE-2 performance with the Intel compiler. Reply
  • JohanAnandtech - Friday, June 03, 2005 - link

    minsctdp: A typo which happened during final proofread. All my original tables say 990 MB/s. Fixed now. Reply
  • bostrov - Friday, June 03, 2005 - link

    My own results for flops 2.0: (compiled with Intel C 8.1, 3.2 Ghz Prescott with 160 Mhz - 5:4 ratio - FSB)

    flops20-c_prescott.exe

    FLOPS C Program (Double Precision), V2.0 18 Dec 1992

    Module Error RunTime MFLOPS
    (usec)
    1 1.7764e-013 0.0109 1288.7451
    2 -1.4166e-013 0.0082 852.7242
    3 8.1046e-015 0.0067 2531.7045
    4 9.0483e-014 0.0052 2858.2062
    5 -6.2061e-014 0.0140 2065.6650
    6 3.3640e-014 0.0100 2906.2439
    7 -5.7980e-012 0.0327 366.4559
    8 3.7692e-014 0.0111 2700.8968

    Iterations = 512000000
    NullTime (usec) = 0.0000
    MFLOPS(1) = 1088.7826
    MFLOPS(2) = 854.7579
    MFLOPS(3) = 1609.7508
    MFLOPS(4) = 2753.5016

    Why are the anandtech results so poor?
    Reply
  • melgross - Friday, June 03, 2005 - link

    I thought that GCC comes with Tiger. I have read Apple's own info, and it definitely mentions GCC 4. Perhaps that would help the vectorization process.

    Altivec is such an important part of the processor and the performance of the machine that I would like to see properly written code used to compare these machines.
    Reply
  • Reflex - Friday, June 03, 2005 - link

    NT was designed primarily by Dave Cutler, who was one of the guys behind VMS at DEC. NT is not based on Mach and has no relation to it, although it shares some similarities with BSD and VMS. Reply
  • tfranzese - Friday, June 03, 2005 - link

    #35, Apple's platform uses HT links (don't ask me specifics). Reply
  • minsctdp - Friday, June 03, 2005 - link

    What's with the 24 MB/s memory write time on the Xeon, vs. nearly 2GB/s for the others? Looks bogus. Reply
  • querymc - Friday, June 03, 2005 - link

    I'd still like to see a Linux on G5 test. Without one, we still don't know for sure whether the bad performance is due to OS X or the hardware. And it's definitely useful for G5 owners to know whether they can expect Linux to improve server performance. Reply
  • querymc - Friday, June 03, 2005 - link

    NT is not built on Mach. NT itself was originally a microkernel-based OS, derived from the design of DEC's VMS OS via the lead architect of both, Dave Cutler. It's currently very monolithic, a bit more than OS X because they stuffed a lot of userspace cruft from Windows 9X in the XP kernel for binary compatibility.

    Rick Rashid(sp?) was one of the co-developers of Mach, and he went to Microsoft, which is probably what OddTSI is referring to. I don't recall whether he went to research or the OS group, though. Either way, NT has no Mach code and does not share Mach's design.
    Reply
  • Netopia - Friday, June 03, 2005 - link

    OddTSI (Poster 37)-- Do you have any supporting data for saying that NT is built on Mach?

    Joe
    Reply
  • AluminumStudios - Friday, June 03, 2005 - link

    Intersting article. I wish you hadn't left out AfterEffects though because I use it heavily and I'd love to see a comparison between the Mac and x86 on it. Reply
  • OddTSi - Friday, June 03, 2005 - link

    There's a semi-big error in your discussion on page 7. NT (and the subsequent Windows OSes based on it) is NOT a monolithic OS. In fact NT is BASED ON MACH. The main developer for the Mach micro-kernel was one of the lead developers of NT. Reply
  • octanelover - Friday, June 03, 2005 - link

    I think it would be interesting, on the server side of things, to include Solaris 10 on Opteron in your benchmark list. Seeing as how Solaris is still a major player in the server world it would be nice to see how it fares along with Linux and Mac OSX.

    By the way, this article, IMHO, is darn near groundbreaking. Excellent work and very illuminating.
    Reply
  • exdeath - Friday, June 03, 2005 - link

    And before we talk about 10 Gb/sec busses, don't forget the Opteron can have like what 3 HT channels?

    And Hyper Transport specs allow for 22 GB/sec per channel (11 GB/sec bidirectional?)
    Reply
  • exdeath - Friday, June 03, 2005 - link

    Wow look at a 2.4 GHz Opteron clean house.

    I'd like to see what a 2.6 GHz FX-55 with unregistered memory would do ;) I'll be fair and say keep it at 2.6 GHz stock ;)
    Reply
  • bersl2 - Friday, June 03, 2005 - link

    Right. GCC 4.0 has an all new optimization framework, including autovectorization:

    http://gcc.gnu.org/projects/tree-ssa/vectorization...
    Reply
  • Pannenkoek - Friday, June 03, 2005 - link

    It is well known that GCC 3.3 can't vectorize code. However, GCC 4 should be able to, eventually if not already.

    The small cache of the G5 would hamper its server performance I'd reckon, regardless of other factors.
    Reply
  • jimbailey - Friday, June 03, 2005 - link

    I'm curious if you rebuilt Apache and MySQL from source. Apple has added significant amount of optimization to gcc and I would love to know if it has been included in this test. I don't doubt the results though. The trade off for using the Mach micro-kernel is well known. Reply
  • rubikcube - Friday, June 03, 2005 - link

    Johan, I agree that all the facts point to your conclusions being accurate. I would bet all the money in the world that you are correct. However, this hypothesis is easily confirmed by running mysql on a G5 running linux. Reply
  • Olaf van der Spek - Friday, June 03, 2005 - link

    > In Unix, this is done with a Syscall, and it results in two context switches (the CPU has to swap out one process for another)

    Does it?
    As far as I know it doesn't. The page tables don't need to be swapped and neither does the CPU state. The CPU gets access to the kernel-data because it goes to kernel-mode, but that doesn't require a full context switch I think.
    Reply
  • WileCoyote - Friday, June 03, 2005 - link

    Tough crowd... Reply
  • Eug - Friday, June 03, 2005 - link

    Of the stuff I understand, I agree with your conclusions, but I think it's reasonable to state that running Linux on the G5 yourself would have been the most definitive test.

    Anyways, I like fusion food. :)
    Reply
  • cHodAXUK - Friday, June 03, 2005 - link

    Great article, very educational read and it was very interesting to see what is holding the G5 back. IBM/Apple really need to address these issues, people are paying alot of money for G5's that are dilvering nowhere near the level of performance that they *theoretically* should be. Reply
  • Netopia - Friday, June 03, 2005 - link

    WOW... great article.

    I too would like to see Yellow Dog (Or FC4) loaded on the G5 for a true head-to-head. I hope you have the time with the box to get 'er done!

    Joe
    Reply
  • tfranzese - Friday, June 03, 2005 - link

    Kind of snappy there Johan.

    I do prefer numbers coming from one source myself.
    Reply
  • JohanAnandtech - Friday, June 03, 2005 - link

    Rubikcube: Speculative? Firstly, Both a webserver and a database server show terrible performance. Secondly, LMbench shows there is definitely a problem with creating threads. So everything point into our "speculative" conclusion.

    Thirdly, as mentioned in an earlier post:
    http://www-106.ibm.com/developerworks/linux/librar...
    is another indication that there is nothing speculative about our conclusion.

    Reply
  • rubikcube - Friday, June 03, 2005 - link

    #21 I disagree. Most of the end of the article on the threading problems was speculative. We can't say that's the cause without actual testing. Reply
  • Jalf - Friday, June 03, 2005 - link

    To those wanting a Linux on G5 test, keep in mind the entire purpose of this article. It was to test the performance of a Mac computer running a Mac OS, compared to a Intel/AMD PC.

    So while installing Linux on the G5 would give us a better idea of how the CPU itself performs, it would also leave out the huge effect the OS also has (You wouldn't have seen the huge performance problems with threading, for example.)
    Reply
  • Jalf - Friday, June 03, 2005 - link

    #11: Not true, if you browse AMD's documentation for a bit, they do say that their TDP *is* the absolute max power.
    Intel uses the "maximum power achievable under most circumstances"-method though.
    Reply
  • rubikcube - Friday, June 03, 2005 - link

    I agree that linux should have been used for a more normalized comparison. I also think that you should have tried running your mysql tests from darwin on x86. You might have been able to find the cause of the performance anomalies. Reply
  • Sabresiberian - Friday, June 03, 2005 - link

    I find it hilarious that someone calling him- or herself 'porkster' is complaining about someone else's language :)

    Apple's computers have made their fame on their user-friendliness, so I think it is very appropriate to compare these computers with OSX on the Apples, as that's where the user-friendliness resides and both OSes are in the same family. It would have been fun to compare using the 64 bit Win XP Pro - I bet we would all get a good laugh out of that. Microsoft is determined, I think, to make a Linux man out of me yet :)

    Reply
  • kresek - Friday, June 03, 2005 - link

    waiting for AnandTech's YDL results, have a look at this:

    http://www-106.ibm.com/developerworks/library/l-yd...
    Reply
  • SMOG - Friday, June 03, 2005 - link

    #13 Thresher: "When it comes down to it, performance is important, but not the only reason people buy what they buy. I would say more often than not, the decision is made with only a modicum of logic."

    Your right, and those people didn't read this article, at best they read the first page then skipped to the last to see if he bashed Apple or not. This article was for those who want to know just what the power of the PowerPC actually is. This is a technical artical, not a buyers guide. This is science.
    Good Job.
    Reply
  • CU - Friday, June 03, 2005 - link

    You mentioned most people don't use the Intel compiler, but it would have been nice to see it and also the windows compiler and the ibm compiler. Reply
  • michael2k - Friday, June 03, 2005 - link

    Well, it shouts to stay away from the XServe unless you happen to have vectorizable code that you have the resources to properly vectorize! Reply
  • erwos - Friday, June 03, 2005 - link

    Excellent comparison of the platforms, although I actually wish they would have spent more time analyzing the graphs.

    Like the others, I would have liked to see a G5 / Linux benchmark (now that FC4 has a PPC version, you could run a fairly reasonable one), but I do admit it's not a very popular option compared to x86. My curiosity is whether MacOS X is the problem, or whether it's some sort of issue with the CPU itself. Seems unlikely the G5 would have such a fundamental flaw, but it does shout to stay the hell away from the Xserve until these issues are resolved.

    -Erwos
    Reply
  • Thresher - Friday, June 03, 2005 - link

    As the owner of intel, AMD, and Mac based computers, I have to say this is one of the best and most thorough comparisons I've seen.

    You did an excellent job of isolating CPU and OS performance.

    That being said, if performance were the only indicator, there is no doubt in my mind that AMD would be ruling the roost. However, personal preferences come into play to a great deal. Businesses like the reputation behind intel. I prefer the usability of Mac OS X. People have strong feelings about Microsoft that may color their decisions.

    When it comes down to it, performance is important, but not the only reason people buy what they buy. I would say more often than not, the decision is made with only a modicum of logic.
    Reply
  • Cruise51 - Friday, June 03, 2005 - link

    I'd be interested in seeing how it performs on yellowdog aswell. Reply
  • IntelUser2000 - Friday, June 03, 2005 - link

    People, in case some of you misunderstand, the 10.8GB/sec Full Duplex bus means that its two 32-bit 1350MHz bus, rather than one 64-bit bus in the PCs. Its not, 10.8GB/sec x 2 =21.6GB/sec bus, its 10.8GB/sec bus(or more correctly stated 5.4GB/sec x 2). Plus, it says in Apple site that it has TWO(yes two!!!) of the 10.8GB/sec buses, per CPU.

    Summary: Per CPU=10.8GB/sec
    Per Dual Processor System=21.6GB/sec


    Johan, about the AMD TDP number, they never state that its max power, they say its maximum power achievable under most circumstances, its not absolute max power.
    Reply
  • JohanAnandtech - Friday, June 03, 2005 - link

    Porkster: It is a little geekisch Unix joke. Where is your geekish you man spirit?

    Wessonality: Our next project if we can keep the G5 long enough in the labs.

    Ailleur2: indeed, I agree. The G5 is a potent CPU with a lot of potential. Just give it a bigger L2 and a better memory subsystem. This is an architecture that could last very long by applying a few tweaks, like the P6.

    Methodical: All of the benchmarks are trustworthy, they should be looked upon as a whole to get a good picture, not just pick one. About After affects, I indicate that the G5 does very well here (seen other reports on the web), I just didn't have the software in the lab.

    I also warned that this was not about "should I buy an Apple or not?". It is just "if performance is what counts for me, where should I position the G5/Mac os X combiantion compared to x86/Linux/Windows ?".

    Reply
  • StuckMojo - Friday, June 03, 2005 - link


    hmph. you say it yourself in the last paragraph...how come you didn't try it?
    Reply
  • StuckMojo - Friday, June 03, 2005 - link

    yes, it seems you've left out a very good method of testing if OSX is the issue: run a powerPC linux distro with the mysql and apache benchmarks and see what happens!

    i'd be _really_ interested in the results. see if you can update the article with them.
    Reply
  • porkster - Friday, June 03, 2005 - link

    "Root Me" in Australian slang is the same as "Fxxk Me" in common language. Some people my find a picture in this review offensive. Reply
  • wessonality - Friday, June 03, 2005 - link

    What about installing Yellow Dog Linux on the XServe? Reply
  • wessonality - Friday, June 03, 2005 - link

    Reply
  • ailleur2 - Friday, June 03, 2005 - link

    Oh and the graph on page 5 doesnt display correctly in firefox. Reply
  • ailleur2 - Friday, June 03, 2005 - link

    Well that was interesting.

    Im a big apple fan myself but even i never thought od putting osx server in a server room.
    I think the g5 did quite well and had IBM delivered its promise of a 3ghz g5 (and that was supposed to be a year ago) the g5 would have won a couple of tests by a good margin.

    If apple/IBM want altivec optimisations, i think theyll have to do it themselves since the interest level is pretty low.

    One question though, why wasnt linux installed of the g5 if this was a cpu test? I dont know if it makes a damn of a difference but it whould have put them on equal bases.
    Reply
  • Methodical - Friday, June 03, 2005 - link

    I like anands articles way better.

    Your drawing too many conclusions off of data you basically call untrustworthy, but I agree your basic conclusion. The OS still needs more work.

    I really think leaving out After Effects was a bad idea. Its a perfect benchmark. Plugins that do the exact same calculations on the exact same workfiles. Its also one of the biggest things these macs are used for, but I understand your article to be a bit more server-oriented.
    Reply

Log in

Don't have an account? Sign up now