Original Link: https://www.anandtech.com/show/1621
Intel Pentium 4 6xx and 3.73EE: Favoring Features Over Performance
by Anand Lal Shimpi & Derek Wilson on February 21, 2005 6:15 AM EST- Posted in
- CPUs
Introduction
Just last week, we saw the first tests of Intel's newest Xeon processor formerly codenamed Irwindale. The major improvement Irwindale offers over Nocona is an extra 1MB of L2 cache. Our dual processor server configuration showed the 2MB cache of the Irwindale based Xeon offering a significant improvement under certain workloads. In a shared front side bus dual processor configuration, the improved cache hit rate of the 2MB Xeon helps to keep the NetBurst architecture from getting tangled up in the length of its pipeline when working with lots of data. As an added bonus, the impact of sharing a front side bus is softened when processors find more of the data they are looking for locally. On the consumer side, Intel's 600 series doesn't have to deal with shared busses or server sized workloads. Will the 2MB L2 cache still come through and offer a significant performance improvement?
The short answer is that consumer applications running on a single processor system don't see the same kind of benefit from a 2MB L2 as do server workloads running on a DP Xeon. There are areas where performance is affected, but this time around Intel is again refining and broadening its platform rather than simply scaling up speed and power. Let's take a look at the new offerings introduced this week.
First off we've got the new Pentium 4 600 series, launched in four models:
Model | Clock Speed | Socket | L2 Cache | FSB |
Intel Pentium 4 660 | 3.6GHz | LGA-775 | 2MB | 800MHz |
Intel Pentium 4 650 | 3.4GHz | LGA-775 | 2MB | 800MHz |
Intel Pentium 4 640 | 3.2GHz | LGA-775 | 2MB | 800MHz |
Intel Pentium 4 630 | 3.0GHz | LGA-775 | 2MB | 800MHz |
What advantage does the Pentium 4 600 offer over the 500 series? The main features are a 2MB L2 cache, Enhanced Intel SpeedStep Technology (EIST) and EM64T support (Intel's version of AMD's x86-64). The Pentium 4 600 is still built on the same 90nm process as the Pentium 4 500, it's just got twice the cache (which we'll talk about later). Features like EIST and EM64T support were always there on previous 90nm Pentium 4s, they were simply not enabled.
Currently the 500 and 600 series chips are priced to coexist with one another, first let's have a look at what Intel's official prices are:
Pentium 4 500 Series | Pentium 4 600 Series | |
3.8GHz (Model _70) | $637 | Q2 Release |
3.6GHz (Model _60) | $417 | $605 |
3.4GHz (Model _50) | $278 | $401 |
3.2GHz (Model _40) | $218 | $273 |
3.0GHz (Model _30) | $178 | $224 |
Then let's take a look at street prices for the chips using our RealTime Pricing Engine:
Pentium 4 500 Series (street price) | Pentium 4 600 Series (street price) | |
3.8GHz (Model _70) | $690 | Q2 Release |
3.6GHz (Model _60) | $425 | $635 |
3.4GHz (Model _50) | $279 | $429 |
3.2GHz (Model _40) | $231 | $295 |
3.0GHz (Model _30) | $184 | $257 |
The other thing to note is that the 500 series still holds the clock speed crown, with the 570J running at 3.8GHz, while the fastest 600 series is a 3.6GHz Pentium 4 660. What we're seeing here is another example of Intel's move away from clock speeds as the only "improvements" from chip to chip. We will however see a 3.8GHz Pentium 4 670 in Q2 of this year.
Intel's next announcement is the move to a new 90nm core for the Pentium 4 Extreme Edition. Until now, all EE chips have been based off of the old 130nm Northwood core, but with the move up to 3.73GHz the Extreme Edition actually uses the same 90nm core as the new Pentium 4 600 series.
Giving up its 2MB L3 cache in favor of a lower latency 2MB L2 cache, the new Extreme Edition only offers two benefits over the regular Pentium 4 600 series CPUs: clock speed and 1066MHz FSB support. Priced at $999, the new Extreme Edition is priced in accordance with its name, as all of its predecessors have.
The new core, shared by both the Pentium 4 600 and the new Extreme Edition chips, is still built on the same 90nm process as the original Prescott, but thanks to the larger cache weighs in at 169 million transistors, an increase of 44 million (or 35%) over the original Prescott 1M core.
There's a decent amount to discuss with this new core, so let's start at the biggest change - the cache.
Twice the Cache - 17% Higher Latency
Both the Pentium 4 6xx and the new Extreme Edition share the same core, meaning they also have the same L2 cache. When Intel first launched Prescott we noticed that in the move to the new architecture that cache latencies went up tremendously. The increase in cache latencies was to be expected, as one tradeoff of a larger cache is that it takings longer to find and access data. So when we heard that Intel was moving to a 2MB L2 cache with the 6xx series, we wondered how much slower the cache would get.
First we wanted to confirm that L1 cache latencies stayed the same, and they did at 4 cycles for the new Prescott 2M based core:
Cachemem L1 Latency | ScienceMark L1 Latency | |
AMD Athlon 64 | 3 cycles | 3 cycles |
Intel Pentium 4 (Northwood) | 1 cycle | 2 cycles |
Intel Pentium 4 (Prescott) | 4 cycles | 4 cycles |
Intel Pentium 4 (Prescott 2M) | 4 cycles | 4 cycles |
Intel Pentium M | 3 cycles | 3 cycles |
Next up, was L2 cache latency. In our review of the Pentium M processor on the desktop we discovered that its 10 cycle L2 cache was responsible for its solid performance in non "media rich" applications (e.g. office applications, OS performance). The original Prescott had a 23 cycle L2 cache, and with a 2MB cache the latency has gone up to 27 cycles:
Cachemem L2 Latency | ScienceMark L2 Latency | |
AMD Athlon 64 | 17 cycles | 18 cycles |
Intel Pentium 4 (Northwood) | 16 cycles | 16 cycles |
Intel Pentium 4 (Prescott) | 23 cycles | 23 cycles |
Intel Pentium 4 (Prescott 2M) | 27 cycles | 27 cycles |
Intel Pentium M | 10 cycles | 10 cycles |
While we're talking about "only" 4 cycles, at 3.6GHz that's 17% longer to access data from L2 cache. Given Prescott's extremely lengthy pipeline, a 17% increase in L2 cache latency is not going to help minimize the downsides of such a long pipeline. Also keep in mind that the only architectural change here is a larger L2 cache, so none of the normal tricks to help hide memory latencies are expanded upon in the new Pentium 4.
What Intel is counting on is that the increase in hit rate provided by a 100% larger cache will outshine the 17% longer access to L2 cache. Did Intel make the right bet? In order to find out we took the new Pentium 4 660 (3.6GHz - 2MB L2) and compared it to the old Pentium 4 560 (3.6GHz - 1MB L2), with all other variables the same, let's see how much of an impact the extra megabyte of cache has in the real world.
In the business category, we see the added cache paying off a little. SYSMark shows good improvement in the document creation portion of its tests, while the Business Winstone makes some very good gains. Worldbench shows web browsing with Mozilla to have improved a good bit while our compression test and the ACDSee test show a loss in performance. These losses generally indicate areas where the test is more dependant on latency than cache hit rate. On the content creation side, adding Windows Media Encoder to the Mozilla test improves performance more than the individual Mozilla test. This is likely due to the fact that the large cache keeps Mozilla's data from being kicked out while Windows Media Encoder is working.
On the gaming front, Doom 3 is the only test we saw with any performance improvement. And the only other application to show a significant performance gain is Maya with more than a 43% gain. The huge gain in performance under Maya is likely a result of 1MB of cache being too small to fit models in while 2MB is enough. This seems to be a case where the test is very bandwidth sensitive rather than latency sensitive. Dropping most (if not all) of the data being worked on into the L2 cache offers a program a very large boost in apparent bandwidth.
As we can see, the unfortunate truth for performance on the 600 series is that most consumer data sets can fit into a 1MB cache just fine. The added cache does seem to help with multitasking from our limited investigation of the subject. The more threads that hit memory aggressively, the better chance we have of seeing a benefit from the 2MB cache. This is because less data from each thread will be kicked out of the cache, resulting in fewer pipeline stalls.
Unfortunately, most usage models that are a good fit for the 600 series are server and workstation workloads. Streaming data (using or encoding media), games, and most other consumer applications don't have the lots of big data requirement that can really separate the performance of the 1MB and 2MB parts.
As we've provided this chart and gone through the general impact of the benchmarks on Intel's new 600 line, we won't include analysis on the pages with our benchmark data. For those who are interested in a deeper look at the numbers and performance of all 5 new parts, graphs of each benchmark are included later in this article.
Impact of L2 Cache Size on Performance (1MB vs. 2MB - 3.60GHz) | |||
1MB L2 | 2MB L2 | 2MB Performance Advantage | |
Business/General Use Performance |
|||
Business Winstone 2004 | 21.4 | 24.2 | 13.0% |
SYSMark 2004 - Communication | 137 | 137 | 0.0% |
SYSMark 2004 - Document Creation | 201 | 218 | 8.4% |
SYSMark 2004 - Data Analysis | 184 | 186 | 1.0% |
Microsoft Office XP with SP-2 | 522 | 520 | 0.3% |
Mozilla 1.4 | 459 | 422 | 8.0% |
ACD Systems ACDSee PowerPack 5.0 | 547 | 558 | -2.0% |
Ahead Software Nero Express 6.0.0.3 | 545 | 550 | -0.9% |
WinZip Computing WinZip 8.1 | 412 | 411 | 0.2% |
WinRAR | 479 | 469 | -2.0% |
Multitasking Content Creation Performance |
|||
Content Creation Winstone 2004 | 32.7 | 33.9 | 3.7% |
SYSMark 2004 - 3D Creation | 231 | 231 | 0.0% |
SYSMark 2004 - 2D Creation | 288 | 279 | -3.1% |
SYSMark 2004 - Web Publication | 206 | 203 | -1.0% |
Mozilla and Windows Media Encoder | 676 | 601 | 11.1% |
Video/Photo Creation & Editing |
|||
Adobe Photoshop 7.0.1 | 342 | 342 | 0.0% |
Adobe Premiere 6.5 | 461 | 468 | -1.5% |
Roxio VideoWave Movie Creator 1.5 | 287 | 276 | 3.8% |
Audio/Video Encoding |
|||
MusicMatch Jukebox 7.10 | 484 | 470 | 2.9% |
DivX Encoding | 55.3 | 55.4 | 0.2% |
XviD Encoding | 33.9 | 33.4 | -1.4% |
Microsoft Windows Media Encoder 9.0 | 2.57 | 2.56 | -0.3% |
Gaming |
|||
Doom 3 | 84.6 | 88.6 | 4.7% |
UT2004 | 59.3 | 60.4 | 1.9% |
Wolfenstein: ET | 97.2 | 95.5 | -1.7% |
3D Rendering |
|||
Discreet 3dsmax 5.1 (DX) | 268 | 266 | 0.7% |
Discreet 3dsmax 5.1 (OGL) | 327 | 329 | -0.6% |
SPECapc 3dsmax 6 | 1.64 | 1.62 | -1.1% |
Professional 3D |
|||
SPECviewperf 8 - 3dsmax-03 | 17.04 | 17.11 | 0.4% |
SPECviewperf 8 - catia-01 | 13.87 | 13.57 | -2.2% |
SPECviewperf 8 - light-07 | 14.3 | 13.83 | -3.3% |
SPECviewperf 8 - maya-01 | 13.12 | 18.85 | 43.7% |
SPECviewperf 8 - proe-03 | 16.7 | 16.5 | -1.2% |
SPECviewperf 8 - sw-01 | 13.09 | 13.33 | 1.8% |
SPECviewperf 8 - ugs-04 | 15.31 | 13.82 | -9.7% |
An Interesting Observation: Prescott 2M's Die
What has been true for a number of modern day microprocessors is that the vast majority of the CPU is made up of cache, take a look at the Athlon 64 FX with its 1MB L2 cache:
AMD Athlon 64 FX die (the large block to the right is its 1MB L2 cache)
Over half of the die is L2 cache.
But when looking at the new Prescott 2M core the same can't be said:
Intel Pentium 4 600 series die (the large block to the left is its 2MB L2 cache)
The split between logic and cache is almost 50/50, looking back at the original Prescott we see that the Prescott core itself actually occupied more die area than the cache:
Intel Pentium 4 500 series (the block to the left is its 1MB L2 cache)
There are two explanations for the phenomenon, first with Prescott Intel introduced their highest density cache ever produced; at present day it holds the record for largest cache with the smallest area on a modern desktop microprocessor. Secondly, is the fact that Prescott, with its 31 pipeline stages, 64-bit execution units and highly accurate branch predictors with massive branch history tables, is simply a big, complex core. Also remember that the Athlon 64 is basically a reworked K7 core, with wider execution units and datapaths, as well as an on-die memory controller, so it is understandably simpler.
Let's compare that to the Pentium M:
Intel Pentium M 90nm (the large block is its 2MB L2 cache)
The PowerPC 970 (used in Apple's G5 systems) looks a bit more Prescott like, but remember we're only dealing with a 512KB L2 cache here:
IBM PowerPC 970 (the block to the left is its 512KB L2 cache)
Or, even more interestingly, compare it to the newly announced Cell processor:
IBM/Sony/Toshiba Cell microprocessor (we apologize for the quality of the die shot, it's the best we could find)
Looking at Cell is quite interesting because it appears to be just as complex as Prescott, but remember that with Cell we're looking at 9 individual processors. But more on that next week...
Extreme Edition - Not so "Extreme" Anymore
Back when the Extreme Edition was first launched, the Pentium 4 had a "meager" 512KB L2 cache compared to the EE's 2MB L3 and 512KB L2. Now that the Pentium 4 has a 2MB L2 cache and is based off of the same core as the EE, the only benefit that the Extreme Edition offers is 1066MHz FSB support and a slightly higher clock speed.
We looked at the impact of the 1066MHz FSB in the past and quickly found that it didn't do much for the EE. Although we're now running at 3.73GHz for the Pentium 4 Extreme Edition, the benefit from the 1066MHz FSB is still pretty limited.
As for the clock speed advantage, the fastest Pentium 4 6xx is the 660, running at 3.60GHz - 96.4% of the clock speed of the 3.73GHz Extreme Edition - the clock speed difference is effectively nothing.
But the price? The 3.73 EE will retail for $999, the Pentium 4 660: $605. The Extreme Edition was never a good value, but in the case of the new chip, it's basically throwing money away. Let the benchmarks speak for themselves, but your best bet is to wait for the next generation of Extreme Edition CPUs, either with a 4MB L2 cache or the dual core offerings.
Lower Power Consumption
Over time, any good chip manufacturer will be able to tweak and fine tune their manufacturing process to improve yields. The first 90nm Pentium 4s have been in production for over a year now, and thus it's not too far of a stretch to think that today's 90nm Pentium 4 6xx CPUs are being built on a higher yield 90nm process. In addition to normal tweaks in any manufacturing process, the new 6xx series introduces a handful of power saving techniques that help to reduce overall power consumption of the chip.
First introduced in the 5xxJ series, the Enhanced Halt State (C1E) and Thermal Monitor 2 are both mechanisms included in the 6xx series to reduce power.
Whenever the OS executes the halt instruction, the CPU enters what is known as the halt state. Architecturally what's going on in a halt state is the clock signal is shut off to the CPU for some period of time, with no clock signal none of the logic in the chip will do anything and thus power consumption is reduced. Performance is also significantly reduced, however the halt instruction isn't usually called during application usage, so the performance aspects of the halt state aren't very important.
The problem with the halt state is that it does nothing to reduce voltage, only current draw by stopping clocks from going to the CPU. Since Power varies linearly with both current and voltage (P = I * V), you're effectively only addressing half of the problem. The Enhanced Halt State, as Intel calls it, does two things - it reduces the clock speed of the CPU by decreasing the clock multiplier down to its minimum value (on the 6xx series that's 14x, or 2.8GHz) then reducing the voltage. The clock speed is reduced and then the voltage is dropped, to maintain stability.
Intel insists that the enhanced halt state is a significantly lower power state than the conventional halt state, thanks to the reduction in voltage in addition to the reduction in clock speed. While the standard halt state causes a linear reduction in power, Intel's enhanced halt state causes an exponential decrease in power, potentially offering better power savings than the standard halt state. The real world impact obviously depends on how idle your system happens to be.
When the Pentium 4 was first launched there was a lot of bad journalism out there about how it would overheat and reduce its clock speed significantly thanks its integrated Thermal Monitor. If the Pentium 4 sensed that it was operating outside of safe temperatures, its Thermal Monitor can reduce the effective clock speed of the CPU by approximately 50% - once again by cutting clocks to the CPU. In reality, the Pentium 4's clock throttling never actually came into play unless your fan stopped, or your heatsink fell off. More recent Pentium 4s however have been pushing the thermal envelope further and further, finally to the point where throttling can be a problem if you don't use high quality thermal compound and make sure your heatsink is absolutely secure. The performance reduction when the processor throttles is usually pretty significant.
With the new 6xx series of CPUs, Intel introduces Thermal Monitor 2 (TM2) which, as in the case of the enhanced halt state, reduces clock speed (to 2.8GHz) and voltage as well. The performance impact due to TM2 is much less than the original implementation, so it can actually be triggered during normal use without an overly noticeable loss of performance. However if reducing the clock speed and voltage isn't enough, the CPU will still shut itself down in order to avoid any damage just like the other Pentium 4s did.
Yes, we ran Windows Media Encoder 9 with the fan off for 5 minutes.
But both the Enhanced Halt State and TM2 were introduced in the 5xxJ CPUs, what's new to the 6xx series is the Enhanced Intel SpeedStep Technology (EIST). What EIST does is very similar to AMD's Cool'n'Quiet, it is demand based reduction in CPU clock speed and voltage. Using the same mechanism of adjusting clock speed and voltage, based on the application demand, the Pentium 4 6xx will dynamically increase/decrease its clock speed between 2.8GHz and its normal operating frequency, as well as voltage, in order to optimize for power consumption.
Because of the way EIST (and AMD's Cool'n'Quiet) works, there's inherently a drop in performance. The idea is this - if you're performing a task that's not using 100% of the CPU, the CPU will operate at a slightly reduced frequency in order to conserve power. So while some tasks will require that the system run at full speed, others will run at speeds as low as 2.8GHz. With a minimum multiplier of 14x, slower Pentium 4 6xx CPUs won't get a huge benefit from EIST. For example, the Pentium 4 630 runs at 3.0GHz, meaning the drop down to 2.8GHz isn't really going to conserve a ton of power, nor decrease performance all that much.
AMD's Cool'n'Quiet appears to be more flexible, as it can reduce the clock speed all the way down to 800MHz.
How much of a performance impact does EIST result in? Using a 100% load test such as Windows Media Encoder wouldn't tell us much, as EIST would never really kick in. But something like Winstone where the CPU load is varied, is a much better indication - without EIST, the Pentium 4 660 was approximately 5% faster in Business Winstone than with EIST enabled. Under Doom 3, there was no performance difference.
Intel Officially Adds 64-bit
The Pentium 4 600 series and 3.73EE officially enables Intel's 64-bit extensions to x86 (EM64T, Intel's version of AMD's x86-64). We will have a full look at the 64-bit performance of both AMD and Intel's implementations as soon as Microsoft Windows XP x64 is released.
The Test
Our hardware configurations are similar to what we've used in previous comparisons.
AMD Athlon 64 Configuration
Socket-939 Athlon 64 CPUs
2 x 512MB OCZ PC3200 EL Dual Channel DIMMs 2-2-2-10
NVIDIA nForce4 Reference Motherboard
ATI Radeon X800 XT PCI Express
Intel Pentium 4 Configuration
LGA-775 Intel Pentium 4 and Extreme Edition CPUs
2 x 512MB Crucial DDR-II 533 Dual Channel DIMMs 3-3-3-12
Intel 925XE Motherboard
ATI Radeon X800 XT PCI Express
Business/General Use Performance
Business Winstone 2004
Business Winstone 2004 tests the following applications in various usage scenarios:
. Microsoft Access 2002
. Microsoft Excel 2002
. Microsoft FrontPage 2002
. Microsoft Outlook 2002
. Microsoft PowerPoint 2002
. Microsoft Project 2002
. Microsoft Word 2002
. Norton AntiVirus Professional Edition 2003
. WinZip 8.1
Office Productivity SYSMark 2004
SYSMark's Office Productivity suite consists of three tests, the first of which is the Communication test. The Communication test consists of the following:
"The user receives an email in Outlook 2002 that contains a collection of documents in a zip file. The user reviews his email and updates his calendar while VirusScan 7.0 scans the system. The corporate web site is viewed in Internet Explorer 6.0. Finally, Internet Explorer is used to look at samples of the web pages and documents created during the scenario."
The next test is Document Creation performance, which shows very little difference in drive performance between the contenders:
"The user edits the document using Word 2002. He transcribes an audio file into a document using Dragon NaturallySpeaking 6. Once the document has all the necessary pieces in place, the user changes it into a portable format for easy and secure distribution using Acrobat 5.0.5. The user creates a marketing presentation in PowerPoint 2002 and adds elements to a slide show template."
The final test in our Office Productivity suite is Data Analysis, which BAPCo describes as:
"The user opens a database using Access 2002 and runs some queries. A collection of documents are archived using WinZip 8.1. The queries' results are imported into a spreadsheet using Excel 2002 and are used to generate graphical charts."
Microsoft Office XP SP-2
Here we see in that the purest of office application tests, performance doesn't vary all too much.
Mozilla 1.4
Quite possibly the most frequently used application on any desktop is the one we pay the least amount of attention to when it comes to performance. While a bit older than the core that is now used in Firefox, performance in Mozilla is worth looking at as many users are switching from IE to a much more capable browser on the PC - Firefox.
ACD Systems ACDSee PowerPack 5.0
ACDSee is a popular image editing tool that is great for basic image editing options such as batch resizing, rotating, cropping and other such features that are too elementary to justify purchasing something as powerful as Photoshop for. There are no extremely complex filters here, just pure batch image processing.
Ahead Software Nero Express 6.0.0.3
While it was a major issue in the past, these days buffer underrun errors while burning a CD or DVD are few and far between thanks to high performance CPUs as well as vastly improved optical drives. When you take the optical drive out of the equation, how do these CPU's stack up with burning performance?
As you'd guess, they're all pretty much the same, with the slight variations between chips falling within expectations. Any of these chips will do just fine.
Winzip
Archiving performance ends up being fairly CPU bound as well as I/O limited.
WinRAR 3.40
Pulling the hard disk out of the equation we can get a much better idea of which processors are truly best suited for file compression.
Multitasking Content Creation
MCC Winstone 2004
Multimedia Content Creation Winstone 2004 tests the following applications in various usage scenarios:
. Adobe® Photoshop® 7.0.1
. Adobe® Premiere® 6.50
. Macromedia® Director MX 9.0
. Macromedia® Dreamweaver MX 6.1
. Microsoft® Windows MediaTM Encoder 9 Version 9.00.00.2980
. NewTek's LightWave® 3D 7.5b
. SteinbergTM WaveLabTM 4.0f
As you can see above, Lightwave is part of the MCC Winstone 2004 benchmark suite. As an individual application, Lightwave does manage to get a healthy performance benefit with multithreaded rendering enabled, especially when paired with Hyperthreading enabled CPUs like the Pentium 4s here today. All chips were tested with Lightwave set to spawn 4 threads.
ICC SYSMark 2004
The first category that we will deal with is 3D Content Creation. The tests that make up this benchmark are described below:
"The user renders a 3D model to a bitmap using 3ds max 5.1, while preparing web pages in Dreamweaver MX. Then the user renders a 3D animation in a vector graphics format."
Next, we have 2D Content Creation performance:
"The user uses Premiere 6.5 to create a movie from several raw input movie cuts and sound cuts and starts exporting it. While waiting on this operation, the user imports the rendered image into Photoshop 7.01, modifies it and saves the results. Once the movie is assembled, the user edits it and creates special effects using After Effects 5.5."
The Internet Content Creation suite is rounded up with a Web Publishing performance test:
"The user extracts content from an archive using WinZip 8.1. Meanwhile, he uses Flash MX to open the exported 3D vector graphics file. He modifies it by including other pictures and optimizes it for faster animation. The final movie with the special effects is then compressed using Windows Media Encoder 9 series in a format that can be broadcast over broadband Internet. The web site is given the final touches in Dreamweaver MX and the system is scanned by VirusScan 7.0."
Mozilla + Media Encoder
While AMD dominated in WorldBench 5's Mozilla test, encoding a file using Windows Media Encoder in the background not only makes this test more appreciative of the Pentium 4 but also of Hyper Threading.
Video Creation/Photo Editing
Adobe Photoshop 7.0.1
Adobe Premier 6.5
Roxio VideoWave Movie Creator 1.5
While Premier is a wonderful professional application, consumers will prefer something a little easier to use. Enter: Roxio's VideoWave Movie Creator, a fairly full featured yet consumer level video editing package.
Audio/Video Encoding
MusicMatch Jukebox 7.10
DivX 5.2.1 with AutoGK
Armed with the DivX 5.2.1 and the AutoGK front end for Gordian Knot, we took all of the processors to task at encoding a chapter out of Pirates of the Caribbean. We set AutoGK to give us 75% quality of the original DVD rip and did not encode audio.
XviD with AutoGK
Another very popular codec is the XviD codec, and thus we measured encoding performance using it instead of DivX for this next test. The rest of the variables remained the same as the DivX test.
Windows Media Encoder 9
To finish up our look at Video Encoding performance we've got two tests both involving Windows Media Encoder 9. The first test is WorldBench 5's WMV9 encoding test.
But once we crank up the requirements a bit and start doing some HD quality encoding under WMV9 the situation changes dramatically:
Gaming Performance
Doom 3
Unreal Tournament 2004
Wolfenstein: Enemy Territory
An oldie but a goodie, Enemy Territory is still played quite a bit and makes for a great CPU test as today's GPUs can easily handle the rendering load of the Quake 3 based game.
3D Rendering
3dsmax 5.1
WorldBench includes two 3dsmax benchmarks using version 5.1 of the popular 3D rendering and animation package: a DirectX and an OpenGL benchmark.
3dsmax 6
For the next 3dsmax test we used version 6 of the program and ran the SPECapc rendering tests to truly stress these CPUs. Since there's not much new to report here we're only going to report the Rendering Composite score. For more details feel free to read our Athlon 64 FX-55 Review where we analyze the performance data in much greater depth.
Workstation Applications
Visual Studio 6
Carried over from our previous CPU reviews, we continue to use Visual Studio 6 for a quick compile test. We are still using the Quake 3 source code as our test and measure compile time in seconds. The results are pretty much in line with what we've seen in the past.
SPECviewperf 8
For our next set of professional application benchmarks we turn to SPECviewperf 8. SPECviewperf is a collection of application traces taken from some of the most popular professional applications, and compiled together in a single set of benchmarks used to estimate performance in the various applications the benchmark is used to model. With version 8, SPEC has significantly improved the quality of the benchmark, making it even more of a real world indicator of performance.
We have included SPEC's official description of each one of the 8 tests in the suite.
3dsmax Viewset (3dsmax-03)
"The 3dsmax-03 viewset was created from traces of the graphics workload generated by 3ds max 3.1. To insure a common comparison point, the OpenGL plug-in driver from Discreet was used during tracing.
The models for this viewset came from the SPECapc 3ds max 3.1 benchmark. Each model was measured with two different lighting models to reflect a range of potential 3ds max users. The high-complexity model uses five to seven positional lights as defined by the SPECapc benchmark and reflects how a high-end user would work with 3ds max. The medium-complexity lighting models uses two positional lights, a more common lighting environment.
The viewset is based on a trace of the running application and includes all the state changes found during normal 3ds max operation. Immediate-mode OpenGL calls are used to transfer data to the graphics subsystem."
CATIA Viewset (catia-01)
"The catia-01 viewset was created from traces of the graphics workload generated by the CATIATM V5R12 application from Dassault Systems.
Three models are measured using various modes in CATIA. Phil Harris of LionHeart Solutions, developer of CATBench2003, supplied SPEC/GPC with the models used to measure the CATIA application. The models are courtesy of CATBench2003 and CATIA Community.The car model contains more than two million points. SPECviewperf replicates the geometry represented by the smaller engine block and submarine models to increase complexity and decrease frame rates. After replication, these models contain 1.2 million vertices (engine block) and 1.8 million vertices (submarine).
State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older SPECviewperf viewsets.
Mirroring the application, draw arrays are used for some tests and immediate mode used for others."
Lightscape Viewset (light-07)
"The light-07 viewset was created from traces of the graphics workload generated by the Lightscape Visualization System from Discreet Logic. Lightscape combines proprietary radiosity algorithms with a physically based lighting interface.
The most significant feature of Lightscape is its ability to accurately simulate global illumination effects by precalculating the diffuse energy distribution in an environment and storing the lighting distribution as part of the 3D model. The resulting lighting "mesh" can then be rapidly displayed."
Maya Viewset (maya-01)
"The maya-01 viewset was created from traces of the graphics workload generated by the Maya V5 application from Alias.
The models used in the tests were contributed by artists at NVIDIA. Various modes in the Maya application are measured.
State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets.
As in the Maya V5 application, array element is used to transfer data through the OpenGL API."
Pro/ENGINEER (proe-03)
"The proe-03 viewset was created from traces of the graphics workload generated by the Pro/ENGINEER 2001TM application from PTC.
Two models and three rendering modes are measured during the test. PTC contributed the models to SPEC for use in measurement of the Pro/ENGINEER application. The first of the models, the PTC World Car, represents a large-model workload composed of 3.9 to 5.9 million vertices. This model is measured in shaded, hidden-line removal, and wireframe modes. The wireframe workloads are measured both in normal and antialiased mode. The second model is a copier. It is a medium-sized model made up of 485,000 to 1.6 million vertices. Shaded and hidden-line-removal modes were measured for this model.
This viewset includes state changes as made by the application throughout the rendering of the model, including matrix, material, light and line-stipple changes. The PTC World Car shaded frames include more than 100MB of state and vertex information per frame. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets.
Mirroring the application, draw arrays are used for the shaded tests and immediate mode is used for the wireframe. The gradient background used by the Pro/E application is also included to better model the application workload."
SolidWorks Viewset (sw-01)
"The sw-01 viewset was created from traces of the graphics workload generated by the Solidworks 2004 application from Dassault Systemes.
The model and workloads used were contributed by Solidworks as part of the SPECapc for SolidWorks 2004 benchmark.
State changes as made by the application are included throughout the rendering of the model, including matrix, material, light and line-stipple changes. All state changes are derived from a trace of the running application. The state changes put considerably more stress on graphics subsystems than the simple geometry dumps found in older viewsets.
Mirroring the application, draw arrays are used for some tests and immediate mode used for others."
Unigraphics (ugs-04)
"The ugs-04 viewset was created from traces of the graphics workload generated by Unigraphics V17.
The engine model used was taken from the SPECapc for Unigraphics V17 application benchmark. Three rendering modes are measured -- shaded, shaded with transparency, and wireframe. The wireframe workloads are measured both in normal and anti-alised mode. All tests are repeated twice, rotating once in the center of the screen and then moving about the frame to measure clipping performance.
The viewset is based on a trace of the running application and includes all the state changes found during normal Unigraphics operation. As with the application, OpenGL display lists are used to transfer data to the graphics subsystem. Thousands of display lists of varying sizes go into generating each frame of the model.
To increase model size and complexity, SPECviewperf 8.0 replicates the model two times more than the previous ugs-03 test."
Final Words
So what conclusions do we draw from all this?
The 600 series is more about feature set than performance. Decreasing the cache miss rate and increasing the cache latency isn't exactly the best path to follow in the consumer market. Most PC workloads don't push enough threads or large enough data to really take advantage of the larger cache. We can see the potential improvement in the 43% increase under Maya, and looking back at the Irwindale benchmarks, it's obvious that strapping 2MBs of higher latency cache onto NetBurst has its place. But that won't be the draw of the 600 series on the desktop.
The introduction of EIST and EM64T on the desktop (from the mobile and server space respectively), is a point in the 600 series' favor. Dropping very powerful processors into SFF boxes is more of a possibility with the better heat management features. Of course, the faster the chip, the larger the differences in power as all of the processor models drop to the same frequency and voltage. As for EM64T, we still don't have a 64bit OS from Microsoft. We are on a release candidate, so hopefully we will see a shipping product soon.
The value of these new processors isn't terribly greater than that of the 500 series. If 32bit performance is your only worry, than the 600 processors are not the place to look until the 3.8GHz model becomes available. For those who are interested in the new technology from Intel, it may do to wait and see if prices on the new parts fall after being on the market for some time. Intel wants both of these lines to coexist, but, without a 64bit Windows, there just isn't enough to sell the 600 series over the 500 series yet.
The new 600 series isn't as much of a step forward in performance as it is a step sideways for Intel. As the 600 series core will be the one on which dual core chips are based, it does make sense for Intel to introduce power saving features and a larger cache. Our advice is to look at your favorite application and pick the part that offers the performance you need at the best price. For those who need EIST and EM64T now, even though there is a price premium, performance under the 600 series is generally on par with (or better than) the 500 series.
And if you feel like paying for Intel's 65nm fab plants, feel free to buy the new Pentium 4 3.73GHz Extreme Edition, but if you want the same performance and still want an Intel CPU, the Pentium 4 660 will do just as well.
With dual core coming this year, performance where it is, and street prices showing up higher than we would like to see them, we have trouble recommending the Pentium 4 600 series to anyone who doesn't need it.