Next up, we'll look at floating point performance.

Flops, programmed by Al Aburto, is a very floating-point intensive benchmark. Analyses show that this benchmark contains:

70% floating point instructions;
only 4% branches; and
Only 34% of instructions are memory instructions.
Note that some of those 70% FP instructions are also memory instructions. Benchmarking with Flops is not real world, but isolates the FPU power.

Al Aburto, about Flops:
" Flops.c is a 'C' program which attempts to estimate your systems floating-point 'MFLOPS' rating for the FADD, FSUB, FMUL, and FDIV operations based on specific 'instruction mixes' (see table below). The program provides an estimate of PEAK MFLOPS performance by making maximal use of register variables with minimal interaction with main memory. The execution loops are all small so that they will fit in any cache."
Flops shows the maximum double precision power that the core has, by making sure that the program fits in the L1-cache. Flops consists of 8 tests, and each test has a different, but well known instruction mix. The most frequently used instructions are FADD (addition), FSUB (subtraction) and FMUL (multiplication).

MOD FADD FSUB FMUL FDIV
iMac G5 1.9GHz
iMac Core Duo 1.83GHz
1 50% 0% 43% 7% 705 876
2 43% 29% 14% 14% 490 366
3 35% 12% 53% 0% 2213 1216
4 47% 0% 53% 0% 1349 1178
5 45% 0% 52% 3% 868 1109
6 45% 0% 55% 0% 1509 1291
7 25% 25% 25% 25% 341 235
8 43% 0% 57% 0% 1440 1264
Average: 1114 942

One of the G5's strengths is in its floating point performance, and here, we see an example of that as it holds a 18% performance advantage over the Core Duo.  This does complicate the performance scene, as the move to Core Duo isn't necessarily going to be a clean victory for Apple today.

The last architectural performance test was the Queens benchmark, which does a great job of measuring the performance of a CPU's branch predictor. 

To test the branch prediction, we used the benchmark "Queens". Queens is a very well known problem where you have to place n chess Queens on an n x n board. The catch is that no single Queen must be able to attack the other. The exhaustive search strategy for finding a solution to placing the Queens on a chess board so that they don't attack each other is the algorithm behind this benchmark, and it contains some very branch intensive code.

Queens has about:

23% branches
45% memory instructions
No FP operations

On a PIII, the Branch misprediction rate is up to 19%! (Typical: 9%) Queens runs perfectly in the L1-cache.

As Johan mentioned in his article, it seemed as if a good branch predictor was very important to the chip's designers.  The necessity for a good branch predictor is also evident when you look at how long it takes the G5 to access main memory.  For this test, we looked at Queens performance with 16 queens on the chessboard:

Branch Predictor Performance - Queens (N=16)

The G5 completely dominates the Core Duo here. With a relatively short pipeline, not as much attention is usually paid to branch prediction as on a chip with a longer pipe.

Architecture and Memory Performance Boot Time
Comments Locked

35 Comments

View All Comments

  • snookie - Friday, February 3, 2006 - link

    The article is very good but surprisingly makes the same mistake as so many other reviews which is to test with only 512MB of ram. The intel imac is a much better machine with more ram and it doesn't make sense to test it with the minimum amount. Also Universal apps are coming fast and furious on a daily basis. I've got 1.5 GB of ram in mine and lots of the little apps I use everyday are already UB and are nice and fast as is the OS and iLife apps. It won't be long before Windows runs on these as well as Linux with Red Hat promising support. Check out Bare Feats for some pretty nice benchmarks including games. Yes, Quake 4 will actually run at a decent speed as well as COD 2.
    http://www.barefeats.com/imcd.html">http://www.barefeats.com/imcd.html
  • csoto - Friday, February 3, 2006 - link

    Your only complaints stem from poor choice of models/configuraitons. The 20" unit will provide the added resolution, and BTO options allow up to 2GB on the Core Duo and 2.5GB on the G5 (although a 2GB soDIMM is listed at >$1K!). This is like me complaining that my mini van doesn't have a navigation system, because I was too cheap to buy the model that came with it :)

    Also, your assertion that the Core Duo is a "public beta" is absurd. You had zero problems running applications. Word from those around me that are testing Core Duos is that for most applications, you don't even notice Rosetta. Pro Apps users would complain, but they're never early adopters, because their apps always lag at least a few months behind the latest platform (remember the "multiprocessor plug-in" that allowed Photoshop to limp along for so long before a "MP-native" version was released?). This is a solid platform transition, likely exceeding the fairly solid (albeit far more daunting for the day) transition from 680x0 to PPC.

    Now if only VMWare would ship Workstation for Mac OS X, then I could ditch the Dell...

    Charles
  • Furen - Sunday, February 5, 2006 - link

    He says he already had an iMac so in order to compare the two I'm guessing he bought the closest-matching one possible. I would hardly do to have an 20" iMac compared with a 17" one in power consumption or running at a different native resolution. I do agree that the RAM limits the system insanely but he went for default specs rather you start improving all the draw backs each system has.

    The reason why he says this is like a public beta is not because Rosetta sucks or anything of the sort but because there are almost no universal binaries besides those shipped by Apple. Apple chose to bring these systems forwards (at first they had said the systems would come out mid '06, I believe) without having enough of a software base and that's a pretty big drawback.
  • jepapac - Wednesday, February 1, 2006 - link

    I was just wondering if the graphics adapter on the iMac is upgradeable since it is using pciexpress. Does anyone know?
  • aliasfox - Thursday, February 2, 2006 - link

    I'm guessing its actually the laptop X1600 in the iMac, soldered onto the motherboard. Unfortunate, yes, but given the primary audience that the iMac is targeted at, I'm not surprised.

    Your average home user would rather buy a new $600-1000 box instead of dropping ~$500 for more RAM, a bigger hard drive, new graphics, and a faster processor.
  • Eug - Thursday, February 2, 2006 - link

    quote:

    I'm guessing its actually the laptop X1600 in the iMac

    Why? Previous iMacs used desktop GPU parts.
  • aliasfox - Thursday, February 2, 2006 - link

    I read somewhere that the 9600 in the second generation iMac G5 was a laptop part, and I therefore assumed that since Apple used the same GPUs in the iMac that it used in PowerBooks (GeForce FX5200, Radeon 9600, X1600), it was sourcing the same parts for both lines.

    Also, I've never read about an integrated 9600 or FX5200 as a desktop part. I might be mistaken though.
  • nizzki - Tuesday, January 31, 2006 - link

    Any idea which compilers apple has used for their apps? For example, for the PPC apps I assume apple uses the IBM compiler heavily optimized for PPC instead of GCC.
    If that is the case, with the intel compiler for osx is in beta, the current somewhat lackluster performance of the core duo might be skewed in PPC's favor. This would be further exacerbated if Apple used GCC to compile the macintel apps, since it is unlikely to be heavily optimized for the core duo architecture.
  • Commodus - Tuesday, January 31, 2006 - link

    Just a heads-up, Anand: the Core Duo iMac is the first iMac model to support desktop spanning, not just mirroring. So if you want, you can hook up even a 23" Cinema Display and get a huge amount of extra workspace. I'd probably only do that with a 20" iMac and the 256 MB video memory option, though.
  • ingoldsby - Tuesday, January 31, 2006 - link

    Perhaps it's just me, but the non native apps I run seem to run at about the same speed as they natively ran on my G5. While the universal binaries run much faster.

    I would love to see this comparison revisited with a realistic amount of memory in the machine (ie. 1gb+) instead of limiting the machine to 512mb.

Log in

Don't have an account? Sign up now