Original Link: https://www.anandtech.com/show/757



Introduction

We were admittedly quite harsh on the Pentium 4 in our initial review of it last November, but it wasn’t without good cause.  The technology behind the Pentium 4 was impressive to say the least, but its architectural advantages did not translate into a real world performance improvement over the much more cost effective Athlon or even the older Pentium III in many cases.  But not for a minute did we believe that Intel had completely wasted the 5 years of development time that went into the Pentium 4, in fact, the Pentium 4’s launch was very reminiscent of quite a few launches in Intel’s history.  The release of the 386, 486, Pentium and Pentium Pro were all met with extreme skepticism as none of the aforementioned architectures offered a significant performance improvement over their predecessors.  However in the long run, each one of those architectures prevailed although it was generally after the first or second implementation of the architecture before it truly showed its advantages.

In our Pentium 4 Review we mentioned that two main conditions had to be met in order for the Pentium 4 to succeed.  The first being application demand for the technology; if applications are increasingly optimized for the Pentium 4’s NetBurst architecture then the Pentium 4 will definitely succeed.  Currently, applications are obviously more geared towards the P5 and P6 architectures which have been around for a combined total of 9 years compared to the less than 1 year that NetBurst has been around.

The second condition was that the Pentium 4 had to ramp up in clock speed.  With the architecture’s Hyper Pipelined technology less is being done in every clock, so in order for the Pentium 4 to truly surpass its predecessors it must boast a higher clock speed than the 1.5GHz it was introduced at.  After all, the Pentium 4 and NetBurst were designed for high clock speed operation. 

While the Pentium 4 hasn’t been out long enough to expect for a huge turn around in the demands of today’s commonly used applications, today Intel is announcing the first Pentium 4 clock speed increase over the 1.5GHz P4 that was launched five months ago. 

Dollars and cents

Along with the increased clock speed, Intel has been pushing for lower pricing on the Pentium 4.  Over the next month we should see the Pentium 4 1.7GHz to drop in price to well below $400.  While that will still keep it at least $100 above the price of AMD’s flagship Athlon-C 1.33GHz, it is a much more competitive price point than it was when the Pentium 4 was first released.

Intel has also helped lower the overall cost of Pentium 4 ownership by bundling 2 x 64MB PC800 RDRAM RIMMs with their retail boxed CPUs.  Pentium 4s may begin being offered in boxed configurations with more memory as well as with no integrated memory as prices continue to fall.

Also looking at the pricing of i850 based Pentium 4 motherboards, we see another effort on Intel’s part to keep the cost of ownership of the Pentium 4 as low as possible.  Looking at our Weekly Memory & Motherboard Price Guide from April 13, 2001 you’ll note that the difference in price between the ASUS P4T (i850 board) and the AMD 760 based ASUS A7M266 is only $6.  The reason this is important to notice is because the i850 chipset costs motherboard manufacturers almost twice as much as the AMD 760 chipset does, yet the only difference in price between those two boards is $6.  This either means that Pentium 4 boards based on the i850 chipset are quite cheap to produce (both of the aforementioned boards are 6-layer designs) or it could mean that Intel is eating a lot of the cost of the chipset in an effort to push more Pentium 4 systems into the market.

The effort is an admirable one; however there are a few more hurdles that lie in Intel’s path.  One being that the Pentium 4 does require a new power supply (ATX12V) and chassis as we described in our original review.  Although most motherboards will still allow you to run without the 4-pin ATX12V connector installed, Intel obviously doesn’t recommend doing that.  We have yet to investigate the ramifications of not using an ATX12V power supply with a Pentium 4 system other than we were able to verify that it does work without the +12V power connector attached. 

Another issue the Pentium 4 has in terms of being able to maintain a low price is that it has an extremely large die meaning that the cost of manufacturing is fairly high.  This won’t change until the third quarter of this year when the 0.13-micron Pentium 4 is released.

From a price standpoint, the Pentium 4 is still a few steps away from being as affordable as the AMD Athlon is however the conditions have improved tremendously since the Pentium 4 was launched last year.  And there are no indications of these conditions deteriorating over time, they should only improve. 

With third party Pentium 4 chipsets on the way, the continually decreasing price of RDRAM, and the forthcoming 0.13-micron Northwood core, the Pentium 4 platform is definitely headed in the right direction from a pricing standpoint. 



Now Showing at 1.7GHz

The Pentium 4 1.7GHz has not changed any from the 1.3, 1.4 and 1.5GHz parts that are currently available.  A brief overview of its specs are provided below however since we have covered them all in great depth already we will only provide a quick description of the feature set here.  For more information consult our original review of the Pentium 4 here.


Click to Enlarge

Hyper Pipelined Technology – The Pentium 4 features a much longer pipeline than either the Pentium III or the Athlon.  This unfortunately means that the Pentium 4 accomplishes less per clock, however it does pave the way for the Pentium 4 to achieve much higher clock speeds.  The theory behind this is that the enablement of much higher clock speeds will allow the Pentium 4 to offer a greater performance advantage over its predecessors because being able to do less per clock doesn’t matter if you can hit incredibly high clock speeds.  Case in point would be that the Pentium III was only able to reach 1GHz on its 0.18-micron process while the Pentium 4 is currently at 1.7GHz on the same 0.18-micron process.  And as you’re about to see, there is a clear performance difference between the two.

Improved Branch Prediction – Obviously with such a long pipeline, it is necessary to have an improved Branch Prediction Unit which the Pentium 4 does boast.  The BPU is arguably the most advanced in this sector which is something that has held back the Athlon’s performance somewhat.  Luckily, it seems like AMD will also be giving the Athlon an improved BPU in the upcoming Palomino core.  In any case, the Pentium 4’s BPU must be solid otherwise the penalties associated with its Hyper Pipelined Architecture would cripple the P4 beyond reparation.

Rapid Execution Engine – Two of the Pentium 4’s ALUs (Arithmetic Logic Units: they handle Integer operations) are double pumped, meaning they transfer twice as much data per clock effectively giving them throughput identical to that of ALUs operating at twice the core frequency.  In the case of the 1.7GHz Pentium 4, this means that the ALUs operate as if they were normal ALUs (not double pumped) clocked at 3.4GHz.  As we have discovered in the past, this is necessary in order to provide the Pentium 4 with respectable performance when running Integer code.  Integer code is generally much more susceptible to mis-predicted branches, the lower latency/higher effective clocked ALUs allow the branch mis-predict penalties associated with the Pentium 4’s extremely long pipeline to be minimized when dealing with integer operations.

12K micro-op trace cache – This special cache replaces and improves upon the traditional L1 instruction cache.  The 8-way set associative Execution Trace Cache caches micro-ops after they have been decoded and they are also cached in the predicted path of execution.  This helps to hide some of the performance penalties caused by such a long pipeline.

256KB Advanced Transfer Cache – The Pentium 4’s L2 cache subsystem is quite incredible to say the least.  Not only does the processor have a 256-bit internal pathway to its L2 cache, it is also able to transfer data from the cache once every clock meaning that it has the highest peak cache bandwidth figures of any processor in its class.  At 1.7GHz, the Pentium 4 has a maximum of 54.4GB/s of bandwidth to/from its L2 cache.  In comparison a Pentium III at 1.0GHz can only offer 16GB/s of bandwidth for L2 data transfers and similarly an Athlon at 1.33GHz can only offer 10GB/s of peak bandwidth (the Athlon only has a 64-bit datapath to its L2).

Hardware Prefetch – The Pentium 4 is able to predict what data it will need before it is actually requested to get it from main memory and it will fetch it directly into cache, thus when it is requested the data is already in its cache.  In the event that the data isn’t needed, this becomes a waste of cache space and also FSB/memory bandwidth.  In either case, Hardware Prefetch is a FSB/memory bandwidth hog luckily this next feature of the Pentium 4 architecture helps avoid that being a problem.

Quad Pumped 100MHz FSB + Dual Channel RDRAM – The Pentium 4 has a 100MHz FSB that is quad pumped to offer data bandwidth equivalent to that of a 400MHz FSB, meaning it can transfer at most 3.2GB/s of data to the Pentium 4.  This bus runs synchronously with the i850’s (P4 chipset) dual channel RDRAM setup that runs at 400MHz over a 2 x 16-bit wide buses, for a total of 3.2GB/s of peak memory bandwidth.  While RDRAM was not necessary on the Pentium III platform, when coupled with the Pentium 4, the bandwidth RDRAM offers is very well appreciated.

SSE2 – The Pentium 4 offers an improvement over the original 70 SSE instructions with its 144 new SSE2 instructions however even under SPEC CPU2000, the performance improvement offered by SSE2 optimizations alone is supposedly around 5%.  With SPEC CPU2000 being a highly synthetic benchmark, it is unlikely that SSE2 would translate into any real world performance gains in today’s applications.  One thing that isn’t being taken into account here is SSE2’s ability to handle two 64-bit SIMD-Int and SIMD-FP (Single Instruction Multiple Data; click here for an explanation) operations.  This ability isn’t being taken advantage of in SPEC CPU2000 and could prove to be one of SSE2’s greatest assets.



Heat and Overclocking

The Pentium 4 1.7GHz on-die thermal diode provided temperature readings close to that of our Athlon 1.33GHz.  However, as you will know, the Athlon does not have an on-die thermistor to provide truly accurate core temperature readings but what we can conclude from this is that the Pentium 4 at 1.7GHz is running cooler than the Athlon at 1.33GHz since the latter has a nearly equivalent temperature measured outside of the core which is relatively cool compared to the on-die temperature. 

During our tests the Pentium 4 1.7GHz always operated at 1.7GHz and did not fall victim to any clock throttling because of heat.  You shouldn’t worry about the Pentium 4 dropping its clock speed because of heat unless you are running the processor without a heatsink/fan installed. 

Unfortunately we weren’t able to conduct any overclocking tests on our 1.7GHz Pentium 4 sample since we are waiting for BIOS updates from a few motherboard manufacturers for proper recognition of the processor first.  Only the Intel 850 and MSI 850 boards had publicly available BIOSes with 1.7GHz support at the time of publication and thus we limited our tests to those two platforms which performed within 1% of one another. 



A new Benchmark and Scary Scores

We’ve got good news and unfortunately bad news to report in the benchmarking scene.  As far as the good news goes, BAPCo has finally released SYSMark 2001 which thankfully addressed the issues we have been complaining about for so long with SYSMark 2000 and even introduced a few new interesting ideas into the benchmarking mix.

SYSMark 2001 is finally a multitasking benchmark, running multiple Content Creation (Photoshop, Premier, Dreamweaver, etc…) and/or Office Productivity (Word, Excel, Netscape, etc…) applications at once while switching between them.  The benchmark even includes Outlook 2000 as a part of the test with a decent sized mailbox performing searches, deletes and replies among other operations on the box.  For a full description of the benchmark you can click here to read BAPCo’s whitepaper.

Interestingly enough, SYSMark 2001 outputs not only a final rating score but it also outputs an Average Response Time metric which indicates how long the system often takes to respond to user requests and commands.  This time does not include the time it takes for the “user” to input data or initiate commands since those functions are independent of the performance of your system.  How fast you click isn’t determined by what speed CPU you’re running.  The Average Response Time is a very useful number to look at because it allows us to translate the somewhat meaningless SYSMark ratings into a much more tangible number. 

SYSMark 2001 also employs a WebMark (also by BAPCo/MadOnion) type of testing that runs the benchmark in “user time” instead of in accelerated time like many of the other benchmarking scripts.  This means that the benchmark takes a certain amount of time to input characters, select commands, etc… however the only complaint we have here is that it takes a little too long to do so, but it is bearable.  The benchmark is much more robust and intense than SYSMark 2000, taking up almost 4GB installed and truly taxing a system much like any power user would.  We welcome the benchmark to our test suite and gladly throw out SYSMark 2000 which we have always had problems with.

The unfortunate news comes in the light of an issue discovered with Ziff Davis Media’s Winstone (Business and Content Creation), Windows 2000 SP1 and motherboards with VIA south bridges.  As if the issues with VIA South Bridges couldn’t get any worse, it seems like the aforementioned configuration (take note of the SP1 requirement on the Windows 2000 installation) could artificially inflate Winstone scores.  Luckily we only just switched to Windows 2000 SP1 for our last Athlon 1.33GHz review so this does not affect any previous reviews however it does invalidate our Winstone scores present in that review.  All other benchmarks/scores remain unchanged. 

The maintainer of the benchmark, e-Testing Labs is currently working with Microsoft on finding out if it is an OS issue.  Currently the only way around it is to either use a non-VIA platform (particularly one that features a VIA South Bridge, as it seems to be North Bridge independent) or test under Windows 2000 without SP1 installed.  We have thus suspended the use of Winstone in our conventional CPU test suite until a better work around is found.  Other reviewers should be aware of this problem and investigate their scores accordingly.



The Test

Windows 98SE / 2000 Test System

Hardware

CPU(s)

Intel Pentium III 1.0GHz Intel Pentium 4 1.7GHz
Intel Pentium 4 1.5GHz
Intel Pentium 4 1.0GHz
AMD Athlon "Thunderbird" 1.33GHz
AMD Athlon "Thunderbird" 1.0GHz
Motherboard(s) ASUS CUSL2 Intel D850GB ASUS A7M266/A7V133/A7V
Memory

256MB PC133 Corsair SDRAM (Micron -7E CAS2)
256MB PC2100 Corsair DDR SDRAM
256MB PC800 Samsung RDRAM

Hard Drive

IBM Deskstar 30GB 75GXP 7200 RPM Ultra ATA/100

CDROM

Phillips 48X

Video Card(s)

NVIDIA GeForce2 Ultra 64MB DDR (default clock - 250/230 DDR)

Ethernet

Linksys LNE100TX 100Mbit PCI Ethernet Adapter

Software

Operating System

Windows 98 SE
Windows 2000 Professional

Video Drivers

NVIDIA Detonator3 v6.50 @ 1024 x 768 x 16 @ 75Hz
NVIDIA Detonator3 v6.50 @ 1280 x 1024 x 32 (SPECviewperf) @ 75Hz
VIA 4-in-1 4.29V was used for all VIA based boards

Benchmarking Applications

Gaming

Unreal Tournament 4.32 Reverend's Thunder.dem
Quake III Arena v1.27g demo127.dm3
Mercedes-Benz Truck Racing Timedemo
Serious Sam Public Test 2 (Coop Party 04)

Productivity

BAPCo SYSMark 2001
Benchmark Studio Beta 2.0
MusicMatch Jukebox 6.0 (MP3 Encoder Tests)

Synthetic
Cachemem
Linpack


Cachemem – Latency Comparison

We have said countless times that the Pentium 4’s architecture is centered on low latency operation, but it is about time that we actually investigated that statement a little closer.  Hopefully this will give us some hints on unlocking the performance of the Pentium 4.

Cachemem - Latency Comparison
Time in Cycles (lower is better)
Data Size
Pentium 4 1.7GHz
Athlon 1.33GHz
4KB
1
1
8KB
2
4
16KB
2
4
32KB
19
4
64KB
19
4
128KB
19
20
256KB
25
20
512KB
281
202
1MB
289
202
2MB
292
229
4MB
292
238
8MB
295
241
16MB
297
244
32MB
297
245

We have provided you with two objects to look at, a graph and a table; they both represent the same data but sometimes it’s just easier to look at a graph while we also need the numbers located in the data table to perform some analysis upon.  The latency data is taken with respect to 4KB accesses.

The first thing to notice is that the Pentium 4’s L1 data cache (8KB) can indeed be accessed in half the time of the Athlon’s L1 data cache (64KB).  At the same time we do see that as soon as the size of the test data grows far beyond 8KB, the latency increases dramatically from 2 cycles up to 19 cycles.  In contrast, the Athlon’s larger L1 data cache allows it to maintain a fairly low latency up to 64KB.  By the time the Athlon hits the 128KB data point its latency jumps to 20 cycles and stays there even up to 256KB. 

Remember that the Athlon has an exclusive L2 cache subsystem, meaning that its 256KB of L2 is all free for data, while the Pentium 4’s L2 cache contains a duplicate of the data in its L1 data cache as well.  Although only 8KB in size, this L1 cache duplication in L2 gives the Pentium 4 a higher latency at 256KB than the Athlon but only by 5 cycles.

At the 512KB data point both of the processors have run out of precious cache to depend on and now they both fall victim to the perils of their memory buses.  Fortunately for the Athlon, it makes the transition to DDR SDRAM which gives it an additional 182 cycle latency.  However the Pentium 4 is penalized by RDRAM’s higher latency and thus suffers an additional 271 cycle memory latency simply when moving to 512KB.  Imagine what kind of a performance hit is incurred if an application has a footprint just slightly larger than that of the Pentium 4’s L2 cache.  The performance hit would come from a latency penalty of almost 271 cycles. 

Remember when we said that there is a good possibility that the 0.13-micron Pentium 4 (codename: Northwood) would have a 512KB L2 cache?  Here is strong evidence showing why a larger L2 cache would help.  In order to lessen the effects of RDRAM’s relatively high latency, a larger L2 cache could keep the Pentium 4’s performance quite strong in those applications that aren’t necessarily memory bandwidth dependent but more latency dependent.  As we have seen from other recent investigations, these applications are more commonplace today thus having a larger L2 cache would definitely help the Pentium 4.  Provided that the 0.13-micron Pentium 4 has a 512KB L2 cache, that 271 cycle latency penalty wouldn’t be incurred at the 512KB marker, allowing more of today’s applications to hold their data within the processor’s cache.  A larger L2 cache would help the Athlon as well, but not as much since it is already paired up with a fairly low latency memory subsystem.

Speaking of memory subsystems, do take note of the Athlon’s 51 cycle advantage towards the latter half of the graph.  But as you are about to see, latency is only half of the picture.



Cachemem – Cache Bandwidth Comparison

Although the Pentium 4’s L1 cache is of a much lower latency than the Athlon’s, AMD has definitely got a 64KB part of their die that can be marked advantage-AMD.  As you can see, the Pentium 4 at 1.7GHz and the Athlon at 1.33GHz offer similar bandwidth figures in reads from the L1 cache.  However writing to the L1 cache is much in favor of the Athlon offering almost twice as much bandwidth than the Pentium 4 can offer. 

We have been complaining about the Athlon’s 64-bit path to its L2 cache ever since the Thunderbird core was introduced, and here is why.  In bandwidth intensive applications, the Pentium 4 has a huge advantage in L2 cache bandwidth of 68% in reads and over 40% in writes.  Luckily for AMD, its lower latency memory bus is able hide this since most of today’s applications are in fact more latency dependent than they are memory bandwidth dependent (to a certain extent, using PC66 SDRAM is going to obviously cause a bottleneck).



Memory Bandwidth Comparison

Again you see that the Pentium 4 can truly make use of RDRAM while the Pentium III had very little use for that type of bandwidth.  The first thing to notice here is that both setups advertise considerably higher memory bandwidth figures than are actually being delivered here, and Cachemem isn’t even a real world use benchmark. 

The dual channel PC800 RDRAM setup of the Pentium 4 promises up to 3.2GB/s of memory bandwidth and delivers half of that.  The Athlon’s PC2100 DDR SDRAM promises 2.1GB/s and provides about 52% of that as well.

There is nothing too surprising here; we have always known the Pentium 4’s platform to have much more memory bandwidth than any competing solution. 

To conclude our bandwidth measurements we have the trusty Linpack benchmark.  As you’ll know, the first half of the graph is determined mainly by cache performance and thus clock speed (because in both of these cases the L2 cache operates at the clock frequency).  However we don’t want you comparing clock speeds since we’ve already been through the differences in the architecture, rather we want you to look at the differences in the behaviors of the architectures. 

Notice first of all that in spite of the clock speed advantage, the Pentium 4 does not offer a bandwidth improvement over the Athlon for the first 128KB.  This just goes to show you that clock speed doesn’t mean everything, and a clock for clock comparison between the Pentium 4 and the Athlon is not as useful as it was with the Pentium III since the P3 was much more similar to the Athlon in terms of pipeline length. 

There is a small portion of the graph, while both processors are still dealing with data in their caches, that the Pentium 4 holds a performance advantage over the Athlon.  This advantage is most likely due to the higher bandwidth L2 cache subsystem but what is interesting is that the performance advantage only occurs in a very small portion of the graph, indicating that AMD’s 64-bit L2 cache bus may not be all that horrible for its performance.

The latter half of the graph is governed entirely by main memory bandwidth and isn’t affected by CPU performance as much.  Here you can see RDRAM’s bandwidth advantage over DDR SDRAM in a different light.

Now that we’ve seen the latency and bandwidth differences between the fastest processors from both camps, let’s step into the ring and see how these fighters do in the real world.



Content Creation Performance

As we mentioned in our description of the benchmark, SYSMark 2001 has an Average Response Time metric that it reports as well as a classic rating.  First we see that the Internet Content Creation portion of the SYSMark 2001 benchmark is heavily dominated by the Pentium 4 processor.  The reasoning behind this is simple; a large portion of this test is based on a Windows Media Encoder benchmark that happens to be quite bandwidth intensive.  As we have seen in our synthetic tests, the term bandwidth might as well be synonymous with the Pentium 4.  Even the Pentium III enjoys a slight advantage over the Athlon in this test as it does have a greater amount of L2 cache bandwidth. 

It is interesting to note that the difference between the 1GHz processors of last year and the 1.7GHz processor of today is a mere 1 second in response time.  While it is true that when added up over the course of hundreds of operations this 1 second difference translates into minutes and possibly hours of saved time, it definitely provides a different perspective on how precious such a small unit of time can be.

The Internet Content Creation ratings maintain the same standings as the average response time results we just looked at, which makes sense.  Since this is the first time we have used this suite it is appropriate for us to explain some of the tests that are involved in the benchmark.  Luckily BAPCo provides the summary we’re looking for (taken from http://www.bapco.com/sysmark2001overview.htm):

Internet Content Creation Scenario: In this scenario, a Web developer creates two web pages for an Extreme Sports company selling a Kayaking product. Images are manipulated in Photoshop and web animations created in Flash are used on a web page created by Dreamweaver. A Kayaking promotional video clip is assembled in Premiere. The page also has links to a video clip that was encoded using Windows Media Encoder.

Adobe* Photoshop* 6.0
The user opens a high definition picture and runs a few sample filters, experiments with the size and orientation, changes pixel/inch ratio, fades the image, adds a border, saves the result image under jpeg format and prints the resultant image.

Adobe* Premiere* 6.0
The user assembles a promotional video for a Kayaking product from stock footage. Various effects are added to make it compelling. The video is then exported in a compressed format.

Macromedia Dreamweaver 4
The user creates two web pages for the online extreme sports company. The first page gives an overview of four kayaking products. The second page gives the details on a specific product. The Flash animation, Photoshop images and the encoded video are imported from the respective applications.

Macromedia Flash 5
The user starts with an FLA file. The following operations are then performed: Step through the desired frames; go to the Library and locate the desired symbol to delete; Import and trace bitmaps; Flip and rotate images; Move, scale and group images. Finally export the movie.

Microsoft Windows Media Encoder 7
The user takes a video clip and encodes it in the background.



Office Productivity Performance

The performance picture changes dramatically when we look at the SYSMark 2001 Office Productivity suite.  For starters, the king of the hill is the Athlon 1.33GHz.  But an equally interesting thing to point out is that the range of response times is just 0.59s compared to 0.99s in the Internet Content Creation benchmark.  This indicates that the performance here doesn’t scale as well with CPU speed and is thus not as demanding as the Internet Content Creation tests.  Again, this does make sense judging by the operations that are carried out here. 

You may be surprised as to what category you fall into personally, here is BAPCo’s description of these tests (taken from http://www.bapco.com/sysmark2001overview.htm):

Office Productivity Scenario: This scenario models a corporate user working for an automobile company. The user creates documents using Word, Excel and PowerPoint. The user also accesses email and queries a database. An Internet browser is used to view presentations. The user also invokes a speech to text translation, file compression and virus detection in the background.

Microsoft* Word* 2000
The user opens up an assembly manual (Word) document for a new transmission system. The user makes some formatting changes, inserts step by step diagrams, text additions, applies some different background themes, prints the document, and saves the document in web page format.

Microsoft* Excel* 2000
The user opens up some sales and revenue figures from a spreadsheet. The data is sorted and modified. Various charts related to sales and revenue are created from the data and published in web page format.

Microsoft* PowerPoint* 2000
The user opens up a business presentation to update the previous quarter’s news and sales. Some pictures of automobile manufacturing facilities are inserted into the presentation. Edits and formatting changes are made. Changes are reviewed as they are made using the slide show. The presentation is then given an appropriate background theme and saved in web page format.

Microsoft Access 2000
The user loads last month’s database and cleans up the tables, imports current month’s data from text tables, processes queries, checks results and opens the generated reports and prints them.

Microsoft Outlook 2000,
The user searches for text in the messages in the inbox, archives messages, marks all items as read, spell checks, prints and sends some emails.

Netscape* Communicator* 6.0
The user opens an automotive documentation page and looks for a keyword, then checks the source file. After that, the user browses through a PowerPoint presentation saved in web format.

Dragon* NaturallySpeaking* Preferred v.5
The user transcribes a pre-recorded wave file of a document. The transcribing takes place in the background.

WinZip 8.0
The user compresses a collection of video files in the background.

McAfee VirusScan 5.13
The user runs a Virus scan on some files in the background.

The Office Productivity scores confirm our suspicions; the Pentium 4 is not scaling well at all in this benchmark.  It seems like the biggest difference occurs between 1.3GHz and 1.5GHz for the Pentium 4, but after that the scaling drops right off.  The performance difference between the 1.5GHz Pentium 4 and the 1.3GHz Pentium 4 is 11% while the jump to 1.7GHz only increases performance 4%. 

The 33% increase in clock speed for the Athlon (1.0GHz -> 1.33GHz) results in a 24% performance improvement.



Overall & Constant Computing Performance

The overall performance crown goes to the Pentium 4 because of its success in the Internet Content Creation tests.  The Athlon 1.33GHz is close behind the Pentium 4 1.5GHz.

Although we’ve just been impressed by SYSMark 2001, CSA Research’s Benchmark Studio has been a recent favorite of ours.  The type of performance Benchmark Studio tests is more of what a demanding IT user expects from his/her system in what CSA Research likes to call, a “Constant Computing” scenario. 

This isn’t the type of home user scenario that some are used to; this is a significantly loaded system that, in this case, features 13 concurrent stressor applications running at once.  The concurrent applications range from streaming video to accessing databases and exchange servers.

As you can see here, the performance isn’t directly related to high bandwidth, rather a combination of bandwidth and latency but extremely taxing nevertheless.  The Pentium III is just barely able to make its way, while the Pentium 4 and Athlon 1.33GHz both duke it out for the top spot. 



Gaming Performance

Quake III Arena has long been a favorite of the Pentium 4, and the latest release does nothing to change that.  The 1.7GHz P4 offers close to a 20% advantage over the Athlon at 1.33GHz, but don’t let Quake III Arena be the only judge of performance for you.

The picture changes dramatically in UnrealTournament, where the 1.7GHz Pentium 4 takes a 7% backseat to the Athlon.  This 7% penalty isn’t obviously as great as the gap we saw under Quake III Arena, but it is to show you that there is no clear performance winner in all categories yet.



Gaming Performance (cont)

Under Serious Sam, a relatively new (and fun) game, the Pentium 4 at 1.7GHz and the Athlon at 1.33GHz are separated by no more than 2 fps.  As you can see, this benchmark is scaling wonderfully with clock speed although slightly more so for the Pentium 4 since the 1.33GHz Athlon only holds a 16% increase in performance over the 1.0GHz Athlon.

We will be switching over to the final version of Serious Sam after this review.

In our final gaming test we see that the Pentium 4 once again takes the lead under Mercedes Benz Truck Racing.  In fact, all of the Pentium 4 processors take the lead here indicating that an architectural advantage is keeping the line ahead of the competition here.  Potential candidates for the explanation include cache/memory bandwidth or the Pentium 4’s high-bandwidth FSB. 



FPU/MP3 Encoding Performance

Finally we have our audio encoding test which illustrates that although bandwidth may have become synonymous with the Pentium 4, encoding in general doesn’t follow the same rules.  The Athlon does seem to have a more power FPU when it comes to standard x87 FP code and since MP3 encoding isn’t exactly memory bandwidth intensive, AMD takes home an easy win here.

It will be interesting to see what a heavily optimized SSE2 encoder will be able to do for the Pentium 4; however chances are that the true strengths of SSE2 will lie in its 64-bit SIMD-FP capabilities which are better suited for professional applications than MP3 encoding.



Final Words

When we first reviewed the Pentium 4, the conclusion was simple: don’t be early adopters.  However today, the picture has changed quite a bit.  The Pentium 4’s prices have or are in the process of dropping significantly, and the price of RDRAM has declined as well.  The processor is now clocked 200MHz higher than when we first looked at it, and we also have new benchmarks to truly stress the platform as well as its competitors (SYSMark 2001 and Benchmark Studio).  The real question is, have our recommendations changed? 

The recommendations themselves haven’t changed, but now we can make much more specific suggestions as to what route you should consider.  Quite possibly the most useful benchmark in defining what type of user you are would be SYSMark 2001.  If you find yourself the type of person that is encoding a video file while working on editing an image and a video, switching to a web design application, putting together a demo in Flash and other types of scenarios like that then the Pentium 4 is more suited for your usage style.  If you are this type of a user then you do demand the greater amounts of bandwidth that the Pentium 4’s cache, FSB and memory bus can offer.  Be prepared to pay a premium over an Athlon, but at least you are getting greater performance for it.

The next category described by SYSMark 2001 is the Office Productivity user.  Interestingly enough it seems like many users will fall into this category as it doesn’t only refer to those that run MS Word all day but those that are surfing the net while checking their mail and scanning for viruses, unzipping files, and other such tasks.  In this case, while the Pentium 4’s performance is respectable, the Athlon is king especially because of its lower cost.  

In terms of gaming performance, the current standings are producing mixed results.  In some situations (Quake III Arena, MBTR) the Pentium 4 is dominating, while in others it is tying (Serious Sam) or lagging behind the Athlon (UnrealTournament).  We will have to wait until more games are available that we can test in order to make a solid conclusion for the gamer, however you really can’t go wrong with either setup as far as things stand today.

The Constant Computing performance of both platforms seems to be a draw as well.  At 1.7GHz for the Pentium 4 and at 1.33GHz for the Athlon you’re going to get the highest levels of performance you or your company can buy.  It is mainly an issue of cost, upgradeability, reliability and what other tasks you will be doing that will factor into the decision here. 

As far as conventional x87 FP applications are concerned, as well as those scientific applications that aren’t extremely memory bandwidth dependent, the Athlon does continue to hold the advantage because of its superior FPU.  Although SSE2’s 64-bit SIMD-FP capabilities may change that, we have yet to see evidence of that and probably won’t for quite some time to come.

In closing, Northwood (0.13-micron Pentium 4) is still on the way, AMD’s Palomino core is going to be hitting the streets hopefully in a couple of months, so the ideal recommendation is to wait of course.  Obviously for many this isn’t an option, in which case the above recommendations should help steer you in the right direction.  Another option to consider is a cheap intermediate upgrade to tide you over until the Palomino and Northwood are readily available and upgrade again later.

And just when you thought things had quieted down, they went ahead and got interesting again.

Log in

Don't have an account? Sign up now