Modifying a Krait Platform: More Complicated

Modifying the Dell XPS 10 is a little more difficult than Acer's W510 and Surface RT. In both of those products there was only a single inductor in the path from the battery to the CPU block of the SoC. The XPS 10 uses a dual-core Qualcomm solution however. Ever since Qualcomm started doing multi-core designs it has opted to use independent frequency and voltage planes for each core. While all of the A9s in Tegra 3 and both of the Atom cores used in the Z2760 run at the same frequency/voltage, each Krait core in the APQ8060A can run at its own voltage and frequency. As a result, there are two power delivery circuits that are needed to feed the CPU cores. I've highlighted the two inductors Intel lifted in orange:

Each inductor was lifted and wired with a 20 mΩ resistor in series. The voltage drop across the 20 mΩ resistor was measured and used to calculate CPU core power consumption in real time. Unless otherwise stated, the graphs here represent the total power drawn by both CPU cores.

Unfortunately, that's not all that's necessary to accurately measure Qualcomm CPU power. If you remember back to our original Krait architecture article you'll know that Qualcomm puts its L2 cache on a separate voltage and frequency plane. While the CPU cores in this case can run at up to 1.5GHz, the L2 cache tops out at 1.3GHz. I remembered this little fact late in the testing process, and we haven't yet found the power delivery circuit responsible for Krait's L2 cache. As a result, the CPU specific numbers for Qualcomm exclude any power consumed by the L2 cache. The total platform power numbers do include it however as they are measured at the battery.

The larger inductor in yellow feeds the GPU and it's instrumented using another 20 mΩ resistor.

Visualizing Krait's Multiple Power/Frequency Domains

Qualcomm remains adament about its asynchronous clocking with multiple voltage planes. The graph below shows power draw broken down by each core while running SunSpider:

SunSpider is a great benchmark to showcase exactly why Qualcomm has each core running on its own power/frequency plane. For a mixed workload like this, the second core isn't totally idle/power gated but it isn't exactly super active either. If both cores were tied to the same voltage/frequency, the second core would have higher leakage current than in this case. The counter argument would be that if you ran the second core at its max frequency as well it would be able to complete its task quicker and go to sleep, drawing little to no power. The second approach would require a very fast microcontroller to switch between v/f modes and it's unclear which of the two would offer better power savings. It's just nice to be able to visualize exactly why Qualcomm does what it does here.

On the other end of the spectrum however is a benchmark like Kraken, where both cores are fairly active and the workload is balanced across both cores:

 

Here there's no real benefit to having two independent voltage/frequency planes, both cores would be served fine by running at the same voltage and frequency. Qualcomm would argue that the Kraken case is rare (single threaded performance still dominates most user experience), and the power savings in situations like SunSpider are what make asynchronous clocking worth it. This is a much bigger philosophical debate that would require far more than a couple of graphs to support and it's not one that I want to get into here. I suspect that given its current power management architecture, Qualcomm likely picked the best solution possible for delivering the best possible power consumption. It's more effort to manage multiple power/frequency domains, effort that I doubt Qualcomm would put in without seeing some benefit over the alternative. That being said, what works best for a Qualcomm SoC isn't necessarily what's best for a different architecture.

Introduction Krait: Idle Power
Comments Locked

140 Comments

View All Comments

  • kyuu - Friday, January 4, 2013 - link

    You're the one stepping into the past with the CISC vs. RISC. x86 is not going to go away anytime soon. Keep dreaming, though.
  • iwod - Saturday, January 5, 2013 - link

    Nothing about Architectures in this comment, but by the time ARM Cortex A57 is out, so is Intel ValleyView, which doubles the performance. A57 is expected to give in best case scenario 30 - 50% increase in performance. And All of a sudden this look so similar to 2x Atom performance.

    It will only take one, just ONE mistake that ARM make for Intel to possibly wipe them off the map.

    Although looking into the next 3 - 5 years ahead. It will be a bloody battle instead.
  • Cold Fussion - Friday, January 4, 2013 - link

    Why didn't have any charts which were performance per watt or energy consumption vs performance in the GPU area? If the Mali chip is using twice the energy but giving 3x the performance then that is a very significant point thats being misrepresented.
  • mrdude - Friday, January 4, 2013 - link

    I was thinking the same thing.

    If I can game at native resolution on a Nexus 10 at better frame rates than on the Atom or Snapdragon SoC and the battery capacity is larger and the price of the device is equal, then do I really care about the battery life?

    Although it's nice seeing Intel is getting x86 down to a competitive level with ARM, the most astonishing thing that I took away from that review was just how amazing that MaliT604 GPU is. All that performance and only that power draw? Yesplz :P
  • parkpy - Friday, January 4, 2013 - link

    i've learned so much from AT's review of the iPhone5, Galaxy S III, and Nexus 4, and this article about mobile phones that it makes me wish AT could produce MORE reviews of mobile devices.

    All of this information is crack! I can't get enough of it. Keep up the good work! And Intel, I can't wait for you to get your baseband processor situation sorted out!

    I was already tempted to get a Razr I, but it looks like before the end of the year consumers will have some very awesome technology in their phones that won't require as much time on the battery charger!
  • This Guy - Friday, January 4, 2013 - link

    What if Rosepoint is software defined instead of fixed function?
  • ddriver - Friday, January 4, 2013 - link

    I am confused here - this review shows the atom to be somewhat faster than A15, while the review at phoronix shows the A15 destroying the atom, despite the fact intel's compiler is incredibly good at optimizations and incomparably more mature.

    So I am in a dilemma on who to trust - a website that is known to be generously sponsored by intel or a website that is heavily focused on open source.

    What do you think?
  • kyuu - Friday, January 4, 2013 - link

    Uh, did we read the same article? Where does it show the Atom being "somewhat faster than A15"? The article showed that the A15 is faster than Atom, but at a large power premium.
  • ddriver - Friday, January 4, 2013 - link

    On the charts I see the blue line ending its task first and taking less time, blue is atom, right?
  • jwcalla - Friday, January 4, 2013 - link

    A couple things:

    1) The Phoronix benchmarks were for different Atoms than the one used in this article. I don't know how they compare, but they're probably older models.

    2) The Phoronix benchmarks used GCC 4.6 across the board. Yes, in general GCC will have better optimizations for x86, but we don't know anything (unless I missed it) about which compilers were used here. If this was an Intel sample sent to Anand, I'm sure they compiled the software stack with one of their own proprietary Intel compilers. Or perhaps it is the MS compiler, which no doubt has decades of x86 optimizations built in and probably less ARM work than GCC (for the RT comparison).

    Don't take the benchmarks too seriously, especially since even the software isn't held constant here like it was in the Phoronix benchmarks. It's all ballpark information. Atom is competitive with ARMv7 architectures -- that's the takeaway.

Log in

Don't have an account? Sign up now