Marvell's ARMADA: Custom Designed ARM SoCs Break 1GHz
by Anand Lal Shimpi on October 19, 2009 8:00 AM EST- Posted in
- CPUs
Intel used to be an ARM architecture licensee until 2006, when it sold its XScale division to Marvell. Intel had grown too large, too defocused, and in turn its core business had suffered. Don’t be confused, the focus wasn’t to be shifted back to desktop, but rather back to x86.
It wouldn’t be until 2008 that Intel would reveal its more focused strategy unto the world: Atom.
Intel's Atom processor core
While ARM and its licensees played off Atom as not being remotely threatening, all of them knew that it was only a matter of time. Publicly they reasserted ARM’s dominance in the market. Four billion ARM chips shipped last year alone. Intel sold on the order of tens of millions of Atoms. But privately, the wheels were in motion.
ARM inked a deal with Globalfoundries, AMD’s manufacturing arm, to bring ARM based SoCs to the fab. This gives ARM the sort of modern manufacturing it needs to compete with Intel. The second thing that’s changed is ARM licensees are now much more eager to talk about their architectures and what makes them special.
ARM offers two licensing arrangements to its partners: a processor license or an architecture license. A processor license allows the partner to take an ARM designed core and implement it in their SoC. An architecture license allows the partner to take an ARM instruction set and use it in their own processor. The former is easier to implement, while the latter allows the licensee the ability to optimize the architecture for its specific needs.
The Palm Pre - Powered by ARM
Companies like Samsung and TI hold ARM processor licenses. The Cortex A8 used in the iPhone 3GS (Samsung) and the Palm Pre (TI) is licensed directly from ARM. Marvell however has been an ARM architecture licensee for the past 5 years.
It’s an ARMADA
Marvell is introducing a fleet of new SoCs (system on a chip) and the brand is called ARMADA. Get it?
Marvell is introducing four series of ARMADA and their target markets are below:
SoC | Market |
ARMADA 100 Series | eBook Readers, digital photo frames, portable NAV devices, etc... |
ARMADA 500 Series | MIDs, netbooks |
ARMADA 600 Series | Smartphones, MIDs |
ARMADA 1000 Series | Blu-ray players, TVs, set-top boxes |
These are SoCs so they’ve got CPU, GPU, I/O and networking all included on a single chip. The entire ARMADA line is built on TSMC’s 55nm process. The 100 is super low performance, useful in eBook readers, digital photo frames, IP cameras, etc... The 1000 is a multi-core version of the 100 with additional blocks designed for Blu-ray players, digital TVs and HD set-top boxes.
Both the 100 and 1000 are based on Marvell’s Sheeva PJ1 ARM core. This core uses the ARMv5 instruction set like the ARM9 processor, but performance-wise it should be comparable to an ARM11 implementation.
It’s a single issue in-order core with data forwarding support. The core is a hybrid between the original Marvell CPU team and the XScale CPU team that Marvell acquired in 2006. The pipeline depth is between 5 and 8 stages depending on the instruction group.
The core has two separate ALUs (simple single cycle and complex two cycle), a load/store unit and a multiply unit. The ARMv5 instruction set doesn’t explicitly require floating point so there’s a separate coprocessor for all fp operations. Integer SIMD is handled through a separate Wireless MMX2 unit.
Marvell wouldn’t reveal die sizes but indicated that the PJ1 is comparable to ARM11 based designs in both size and power characteristics.
The more interesting SoCs are in the ARMADA 500 and 600 families. They use the Sheeva PJ4 core, Marvell’s answer to the Cortex A8.
The ARM Cortex A8 is an in-order, dual issue microprocessor with a 13-stage integer pipeline clocked at around 600 - 800MHz today. Marvell’s PJ4 core implements the same ARMv7 instruction set, but the architecture is much different. It’s still an in-order, dual issue core but the integer pipeline is 6 - 9 stages depending on the instruction.
The shorter pipeline apparently doesn’t come at the expense of clock speed. Through the use of some custom logic Marvell is able to deliver clock speeds greater than 1GHz.
Both L1 and L2 caches are supported, just like the Cortex A8.
The biggest issue I can see with Marvell’s PJ4 is that it doesn’t support ARM’s NEON SIMDfp instruction set. Marvell argues that Wireless MMX2 penetration is higher than NEON. Given the limited use of Cortex A8 in the market today, I don’t see a lack of NEON compatibility as a major issue for now but it could be one down the road depending on developer uptake.
On paper the PJ4 would appear to have much higher IPC and clock speed than the Cortex A8. Marvell was unwilling to share any power or performance data at this time, so it remains to be seen exactly how well Marvell’s architecture competes in the real world but on paper, at a high level, it looks good.
26 Comments
View All Comments
PandaBear - Thursday, October 22, 2009 - link
The reason CISC is easy on code size is that a lot of instruction and data are less than 32 bit. ARM has a Thumb instruction set that is only 16 bit so most of the code can be cramped into smaller size. When you are talking about code size of 48kB and an OS that is only 6kB, any saving counts.14 cents processor is the very low end one. The more powerful ones cost more of course. The processor in your flash drive is only around that much.
Sc4freak - Monday, October 19, 2009 - link
Yes, it's the other way around. CISC architectures were designed to reduce memory pressure by cramming more work into the instructions. This reduces memory bandwidth requirements, but increases chip complexity.ProDigit - Monday, October 19, 2009 - link
Interesting Article!!It's only a pitty that they still are manufacturing at 55nm. But even at 55nm, it can best out Intel's Atom on power levels, though be it with a little lower performance.
When this ARM will be manufactured on the same die size as the Atom processor, it very well could outperform the Atom processor!
The only con is that it does not have a form of hyperthreading available, that would make use of the sleeping parts in a core; and add a reduced system response time.
It will have to bounce against the Atom SOC, which may utilize better 3D graphics (eventhough Intel graphics are pretty crappy, they might be better than ARM's graphics chip), at 45nm and a chip at 32nm.
Nomatter how optimized the chip is, it's hard to beat a 32nm chip with a 55nm one performance/powerdraw-wise.
Lekko - Monday, October 19, 2009 - link
what if you put on a conductive sheet under the processor to just double-up on the underside thickness of the pads? It would give you just a bit more contact especially if you used a softer conductive material for the pins to better mash into.That could be a potential $15 fix to the issue. Just need someone to manufacture a sheet with conductive pads in the same array.
bobsmith1492 - Monday, October 19, 2009 - link
Wrong article?That sounds like a good idea, though (I'm assuming you mean the i5 pin contact issue).
Ronamadeo - Monday, October 19, 2009 - link
Cortex A8 vs Snapdragon vs This.This is getting insane, we need phone benchmarks. Nowish.
I want to know whether the A8 is faster than a snapdragon.
roymbrown - Monday, October 19, 2009 - link
Snapdragon IS Cortex A8 as is Ti's OMAP3xxx and Samsung's processor in the iPhone 3GS. Other features (graphics, I/O, cache) may vary, but the processor core in these are identical. This is what makes the Marvel offering unique.Sc4freak - Monday, October 19, 2009 - link
It's not a Cortex A8 in Snapdragon. Rather, it's a Qualcomm-customised core that's similar to the A8, but runs at a higher clockspeed (1ghz).roymbrown - Thursday, October 22, 2009 - link
I stand corrected. Sorry for the misinformation and thanks for the correction. Cortex A8 vs Scorpion(processor core in Snapdragon) vs Sheeva benchmarks would indeed be interesting.Is Scorpion "based" on Cortex A8 beyond the instruction set? Most articles I see just say "similar to Cortex A8" and the ARM licensees page lists them under ARM11 licensees, but not Cortex. Does anyone know if Scorpion is code compatible with Cortex A8 or are there instruction set differences, like Sheeva's lack of NEON?
Randomblame - Monday, October 19, 2009 - link
cool beans, maybe this will push the price of the snapdragon chips down a bit and make faster phones cheaper. I just can't wait to ditch the msm7201a it is the slowest most horribly underperforming chip in the universe.in 2002 microsoft changed the name of it's pocket pc os to windows mobile. every year or two since then they've changed the way it looks ever so slightly. Nearly 8 years later the os is the same damned thing but the processors have shrunk 4 times and been redesigned over and over - yet they run this operating system just as slow as they did 8 years ago. I'm ready for some changes.