Name: Intel Developer Forum - Beijing 2007: Penryn and Intel's High End GPU
Item: Intel Developer Forum - Beijing 2007: Penryn and Intel's High End GPU
Author: Anand Lal Shimpi

Original Link: https://www.anandtech.com/show/2211

Intel Developer Forum - Beijing 2007: Penryn and Intel's High End GPU

VIEW ARTICLE

by Anand Lal Shimpi on April 16, 2007 9:00 PM EST

Posted in
CPUs

13 Comments

Intel quickly expanded its developer forum to include many stops across the globe, but after recent cost cutting measures it scaled back the number of IDFs down to just three. The Spring IDF, normally held in San Francisco, has instead moved to Beijing, while Fall IDF will take place in SF as usual.

China is a logical location for an IDF given the focus of technology on its booming population, and this IDF comes at a particularly open time for Intel making it even more likely to be the site of great disclosures.

This Spring IDF follows on the coattails of Intel’s Penryn/Nehalem announcements and while we won’t get nearly as many details out of it, there are definitely important items to come.

Intel’s Justin Rattner spoke briefly about Intel’s plans for producing even lower powered CPUs before the end of the decade; the target being a 10x reduction in power consumption for at least one chip in Intel’s lineup by 2010. Rattner also demonstrated a 2 TFLOP version of Intel’s 80-core Teraflops research chip, most likely running at a higher clock speed to hit the 2 TFLOPs figure.

A trail of Skulls

An unusual announcement came out of IDF today about Intel’s upcoming Skulltrail platform. Skulltrail is effectively a competitor to AMD’s Quad FX platform, it is a dual socket motherboard capable of supporting two quad-core CPUs. Intel lists Skulltrail has having support for “four PCI Express slots”, which we can only assume means four PCIe x16 slots.

Skulltrail won’t make its debut until later this year and Intel isn’t confirming whether or not it uses LGA-775 chips or Xeons. The predecessor to Skulltrail is Intel’s V8 platform, originally demonstrated at CES 2007 in Las Vegas, and unfortunately that platform requires the use of Xeon CPUs. Since it’s effectively a Xeon based workstation platform, you’re stuck with using FB-DIMMs which are hardly desirable from a cost, performance or thermal standpoint when building an enthusiast class PC. We do hope that Skulltrail severs the tie with Xeon/FB-DIMMs but we are quite suspicious of Intel’s lack of full disclosure today at IDF.

The Long Awaited Penryn Update

In an unprecedented move, Intel made a very full disclosure of its first 45nm processor family, codenamed Penryn a couple of weeks ago. There were some vague elements of the initial Penryn disclosure that we’ve since cleared up.

First off, Penryn is designed to support up to a 1600MHz FSB, however we wondered whether desktop chips would even see the faster FSB support given that we haven’t so much as heard of support for it on Intel’s upcoming 3 series chipsets (e.g. P35, X38). It turns out that Intel is only confirming 1600MHz FSB support for Penryn based Xeon processors for the HPC market, not for the mobile or desktop markets. This tells us two things: 1) Intel is feeling AMD’s bandwidth advantage and strength in the HPC market and is using the faster FSB to help level the playing field, and 2) the desktop will most likely not see a FSB faster than 1333MHz.

Remember that with Nehalem being introduced in 2008, Intel will begin shifting away from its aging FSB architecture to a point-to-point interface akin to what AMD introduced with the K8 back in 2003. It doesn’t make a lot of sense for Intel to invest much money into moving cost focused desktop platforms to 1600MHz FSB only to abandon the efforts in a year’s time. While Intel hasn’t said anything, we’re expecting Penryn desktop parts to be 1333MHz FSB only, which makes sense given that the upcoming P35 chipset officially supports a maximum FSB frequency of 1333MHz.

Intel also mentioned that Penryn would support SSE4, but is its implementation complete or will we have to wait until Nehalem for that? It turns out that Penryn will support a total of 47 SSE4 instructions, not the full implementation of the ISA extensions. There will be an additional 7 instructions that Intel is stating will come in future microprocessors, we’re assuming that Intel is talking about Nehalem but it’s not yet set in stone.

The two interesting power related technologies that will make their debut with Penryn will apparently be mobile-only for now. Intel’s C6 state and EDAT (Enhanced Dynamic Acceleration Technology) will only be supported on mobile Penryn platforms given the nature of the two features. As a recap, the C6 power state allows for an extremely low power operating mode, the closest to a full reset of the CPU, while idle. Data is completely expelled from the on-die caches and the caches themselves are powered off, while core voltage is reduced to the lowest amount allowed by the process. The CPU’s state is saved in some on-chip storage, then the majority of the chip is powered down into a virtually off state. Recovery from C6 is possible, the state is read back from the CPU and the chip is powered up as it would from reset but with memory of what it was doing before it entered C6. The wakeup process does take some time (not noticeable to the user) thus it impacts performance and is suitable for mobile environments where the impact on battery life is worth the reduction in system performance.

Intel’s EDAT is the other mobile-only Penryn technology Intel talked about in its disclosure, and it allows the increase in clock speed of one core on a mobile Penryn when the other core is not in use. The idea is simple: in a notebook you are constrained by the cooling system used, not by the maximum clock speed attainable by the CPU itself. When running single threaded applications (or multithreaded applications with only one CPU intensive thread), the remaining core can power down reducing the total thermal footprint of the CPU itself. An EDAT enabled mobile Penryn core can then detect that only one core is being used and increase the clock speed of that operational core by a single speed bin (e.g. 2.40GHz to 2.66GHz) in order to provide a boost in performance to that one active thread. Once again, EDAT will be mobile-only.

Finally, with regards to motherboard support, Intel isn’t making any guarantees about Penryn’s backwards compatibility. While Penryn will still use the LGA-775 socket that Prescott and Conroe have used, motherboard support will require more than just the presence of the socket. If the appropriate VRM spec is implemented, then Penryn will work on your LGA-775 motherboard, the problem is that motherboard manufacturers haven’t yet released information on which of their boards will support the Penryn VRM changes. If history repeats itself, you can expect very limited official support for Penryn in currently shipping motherboards and guaranteed support with boards based on Intel’s new 3 series chipsets (e.g. P35). We did see Penryn up and running on an Intel BadAxe2 board, but it had a hardware VRM modification done to it in order to properly support Penryn. Penryn may also be able to work on boards without a VRM mod, however at increased (potentially out-of-spec) voltage settings.

At IDF Beijing Intel unveiled a little more about Penryn performance; it compared a quad-core 3.33GHz (1333MHz FSB) Yorkfield with 12MB of L2 cache (2 x 6MB per dual core die) to a quad-core Core 2 Extreme QX6800 2.93GHz (1066MHz FSB) Kentsfield with 8MB of L2 cache (2 x 4MB). According to Intel’s own benchmarks, Intel saw a 15% increase in imaging related applications, 25% in 3D rendering tests, greater than 40% in games, and a greater than 40% increase in video encoding performance when SSE4 support was utilized.

Obviously some of the performance improvement can be attributed to the higher clock speed and faster FSB of the Yorkfield system, while the remaining would be due to architectural enhancements and larger cache of Penryn. The percentage improvement Intel is indicating with Penryn is quite high, but as we’re comparing across different clock speeds it’s a bit of a skewed comparison. Don’t expect Penryn to have the same performance impact that Conroe did upon its introduction, but rather expect an evolutionary continuation of the performance we’ve seen from Intel thus far. Unlike the other P in Intel’s codename history, there are no terrible surprises with Penryn that will result in a step back in performance.

Project Larrabee

Intel describes project Larrabee as a “highly parallel, IA-based programmable architecture” that will be “easily programmable using many existing software tools, and designed to scale to trillions of floating point operations per second...” Intel goes on to say that “[the Larrabee architecture] will include enhancements to accelerate applications such as scientific computing, recognition, mining, synthesis, visualization, financial analytics and health applications.”

Intel would not say any more about project Larrabee, other than to confirm that it has begun planning products based around the architecture. Looking at the statements above, we can deduce one thing already.

Being based on IA, we expect Larrabee to implement some instance of the x86 ISA, but the real clue comes from the 3 TFLOPs performance target. Let’s get this straight: Larrabee is a super wide, FP powerhouse architecture that can do a better job at accelerating scientific computing applications than current Intel CPUs? Larrabee sure does sound a lot like a high end GPU.

Intel didn’t attach a timeframe to these Larrabee projects other than to say that they were in the initial planning stages now. It is highly unusual for Intel to come out and say that it is working on a very vague new architecture, we can only assume that there is some sort of political motivation behind the Larrabee disclosure.

Larrabee will be an important architecture to watch, we expect to hear more about it at this fall’s IDF back in the US.

More Vague Projects from Intel

Intel announced two other new projects that it’s working on, both of them less vague than the Larrabee announcement but still lacking in details. Pat Gelsinger unveiled Intel’s Tolapai project, a system on a chip (SoC) architecture for the enterprise market. The magic year for Tolapai is 2008, where Intel expects a high level of integration to reduce the footprint of the chip by up to 45% and power consumption by approximately 20% compared to “a standard four-chip design”. We can only assume that four chip design means CPU, North Bridge, South Bridge and Graphics. The 2008 introduction makes sense given that in 2008 Intel will introduce Nehalem which will offer configurations with integrated North Bridge and optional integrated graphics.

Intel will also be working on a SoC designed for the consumer electronics market in 2008, lending further credibility to many of AMD’s reasons for acquiring ATI. The real question is if Intel will be able to pull off market dominance in the CE market without acquiring an external graphics firm.

Final Words

We’re expecting more announcements out of Intel in the next two days, so stay tuned for continued coverage of IDF Beijing.