AI Performance: UL Procyon AI Workloads

Drafting a set of benchmarks relevant to end-user AI use-cases has proved to be a challenging exercise. While training workloads are common in the datacenter and enterprise space, consumer workloads are focused on inference. In early days, the inferencing used to run in the cloud, but increasing privacy concerns, as well as the penalties associated with constant cloud communication, have contributed to the rise in demand for local inferencing capabilities. Additionally, generative AI (such as chatbots and image generators based on input prompts) has also garnered significant interest in recent days. Currently, most of these large-language models (LLMs) run in the cloud, as they are still too resource-heavy to run with reasonable performance in the systems of average users.

UL's Procyon AI benchmarks focuses on these workloads from an edge computing perspective. Broadly speaking, the benchmark is divided into two major components:

  • Computer Vision (inference performance using six different neural network models)
  • Generative AI (image generation using the Stable Diffusion LLM)

An attempt was made to process both benchmarks on the ACEMAGIC F2A 125H as part of the evaluation of its capabilities as an "AI PC". The results are summarized in the remainder of this section.

Computer Vision Neural Networks Performance

The six supported neural networks were benchmarked with the following configurations:

  • OpenVINO CPU with float32 precision
  • OpenVINO GPU with float16 precision
  • OpenVINO GPU with float32 precision
  • OpenVINO GPU With integer precision
  • OpenVINO NPU with float16 precision
  • OpenVINO NPU with integer precision
  • WinML GPU with float16 precision
  • WinML GPU with float32 precision
  • WinML GPU with integer precision

The OpenVINO configurations can be evaluated only on systems with an Intel CPU or GPU or NPU. In general, a neural network model's accuracy / quality of results improves with precision. In other words, we expect float16 to deliver better results than integer, and float32 to be better than float16. However, increased precision requires more complex calculations and that results in higher power consumption. As general purpose engines, the CPU is expected to be the most power hungry of the lot, while the NPUs which are purpose-built for neural network acceleration are expected to be better than the GPU configurations. UL has a detailed study of the variation in the quality of results with precision for different networks in their benchmark resources section.

The YOLO V3 network is used for real-time object detection in videos. The graphs below show that at the same precision, OpenVINO performs better than WinML on the GPU. Additionally, for the same precision, OpenVINO performs better on the GPU rather than the NPU

UL Procyon AI - YOLO V3 Average Inference Time

The REAL ESRGAN network is used for upscaling images / restoration of videos and pictures. Relative performance for different precisions / execution hardware is similar to what was seen for the YOLO V3 network.

UL Procyon AI - REAL ESRGAN Average Inference Time

The ResNet 50 network is primarily used for image classification. Again, we see the NPU being slower than the GPU at the same precision, while WinML lags behind OpenVINO for the same underlying execution hardware and precision.

UL Procyon AI - ResNet 50 Average Inference Time

The MobileNet V3 network is used, among other things, for image processing tasks such as tilt correction. Similar to the other networks, WinML again lags behind OpenVINO. However, the NPU is faster than the GPU for the same precision network.

UL Procyon AI - MobileNet V3 Average Inference Time

The Inception V4 network, like the ResNet 50, is primarily used for image classification. Similar to most other networks, WinML performance is not as good as with OpenVINO, and the NPU is slower than the GPU for the same precision.

UL Procyon AI - Inception V4 Average Inference Time

The DeepLab V3 network is used for image segmentation. In other words, it identifies groups of pixels in an image that satisfies specific requirements. The NPU is almost 4x slower than the GPU for the same precision and network. OpenVINO continues to perform better than WinML for the same precision network.

UL Procyon AI - DeepLab V3 Average Inference Time

The UL Procyon AI Computer Vision benchmark run processes each model for 3 minutes, maintaining a count of inferences as well as the average time taken for each inference. It presents an overall score for all six models together, though it is possible that some networks perform better than others for the same hardware / precision configuration.

UL Procyon AI - Computer Vision Inferencing Overall Scores

The Revel Canyon NUC and the F2A 125H come out on top in the CPU-only OpenVINO run with float32 precision. For the OpenVINO GPU runs, the NUC BOX-155H manages to sneak in a slight lead over the other systems. Finally, WinML performance is quite bad compared to OpenVINO.

The benchmark runs for a fixed time. Hence, instead of tracking energy consumption, we opt to report the average at-wall power consumption for the system as a whole for each run set.

UL Procyon AI - Computer Vision Average Power Consumption

As expected, the NPU is the most power-efficient of the lot. Higher precision translates to higher power consumption, and CPU mode is the least power-efficient.

Generative AI Performance

The Stability Diffusion prompt used for benchmarking in UL Procyon AI generates 16 different images. However, on all three system configurations, the benchmark crashed after generating 3 or 4 images. This benchmark is meant for high-end systems with discrete GPUs, and hence we didn't bother to follow up on the crashes.

As we get more systems processed with the UL Procyon AI benchmark, an attempt will be made to get the Generative AI benchmark working on them.

GPU Performance: Synthetic Benchmarks HTPC Credentials
Comments Locked

11 Comments

View All Comments

  • meacupla - Monday, August 26, 2024 - link

    It is amazing what a few extra cm of space does for the thermals.
    Bonus points for 2x2280 storage, but I wish it supported 3x2280 or 2x22110
  • AdrianBc - Tuesday, August 27, 2024 - link

    I agree with you, which is why I have liked a NUC-like computer with Ryzen AI 3xx that is expected to be launched in October and for whom some preliminary tests of a prototype have been shown on Youtube and linked on various sites with computer news.

    That computer has 3 M.2 2280 sockets, replacing the traditional NUC configuration with 1 M.2 2280 socket + 2 SODIMM sockets.

    This was possible because the SODIMM sockets were replaced with faster soldered LPDDRX memory, selectable as 16 GB, 32 GB or 64 GB.

    I have been using a lot of NUC-like computers for many years, and in my opinion for such computers it is far more useful to be able to install three full-size SSDs, than to be able to replace the DRAM. Therefore I approve the choice made by the designers of that computer.
  • Hulk - Monday, August 26, 2024 - link

    If it is running 4.5GHz during CB R23 ST then that result it horrendous. Like 20% lower IPC-wise (throughput) than a similarly clocked Raptor Cove core.
  • Techie4Us - Monday, August 26, 2024 - link

    Design, features & thermals good, low-tier ram & dram-less SSD.....not so much, especially at this price point....

    If they offered a barebones unit for like ~$400, I might be interested, otherwise..pass....

    Also the spec sheet says " OS = W11 Enterprise", then the pricing part right under that says "W11 HOME"... so which is it and how much difference does this make in the price ?
  • ganeshts - Tuesday, August 27, 2024 - link

    ACEMAGIC sells the system with Win 11 Home pre-installed.

    However, when we test mini-PCs, we always wipe and install Windows 11 Enterprise. It just gives us more features to customize the behavior and prevent surprises while benchmarking.

    The pricing includes the license for Win 11 Home (and that is why the mention of the Home variant is in the Pricing entry).
  • meacupla - Tuesday, August 27, 2024 - link

    IDK what you consider "high-tier ram", but DDR5 SODIMM maxes out at 5600.
  • eastcoast_pete - Monday, August 26, 2024 - link

    Ganesh's advice about wiping the drive and do a complete new install of the OS before usw is, unfortunately, spot on. Other sites and reviewers had found potential malware / spyware on at least one Acemagic mini-PC they evaluated. Acemagic did respond very quickly and tried to explain it away, but Ganesh is 100% correct in pointing out that wiping the drive and a fresh reinstall of the OS is the safe thing to do.
  • haplo602 - Tuesday, August 27, 2024 - link

    Why would anybody buy Intel based miniPCs is beyond my understanding. Unless you need Quicksync the AMD based ones are overall better.
  • nandnandnand - Tuesday, August 27, 2024 - link

    Meteor Lake-H has improved integrated graphics considerably. But it all comes down to price in the end.
  • haplo602 - Wednesday, August 28, 2024 - link

    Given what I have seen with the MSI Claw, it also has terrible power management/distribution between the CPU anf GPU ...

Log in

Don't have an account? Sign up now