Linley Newsletter: June 6, 2019

 weSRCH's Best of the Internet Award
  6th-Jun-2019
 152


Linley Newsletter

Please feel free to forward this to your colleagues

Issue #655

June 6, 2019


Independent Analysis of Microprocessors and the Semiconductor Industry

Editor: Tom R. Halfhill

Contributors: Linley Gwennap, Mike Demler, Bob Wheeler


In This Issue:

- Arm CPUs Approach Intel Complexity

- Arm Mali-G77 Goes to Valhall

- Achronix Debuts FPGAs for AI


Arm CPUs Approach Intel Complexity

By Linley Gwennap

Despite the slowdown in Moore's Law, smartphone makers are determined to deliver better performance each year. To do so, they're investing more transistors in CPUs, either of their own design or licensed from Arm. When normalized to the same IC process (a proxy for transistor count), Samsung's custom M4 is about 70% of the size of Intel's Skylake, the most powerful CPU in common use today. Arm's high-end Cortex CPUs are considerably smaller but are quickly catching up: Cortex-A76 is more than twice the size of Cortex-A73, which came out just two years earlier.

Bigger isn't always better. Although Samsung has blown past Apple in CPU size, it still trails in performance and performance per clock (IPC). In fact, the custom Vortex CPU in Apple's A12 processor has a greater IPC than the latest Skylake-based products (including Coffee Lake). This comparison isn't exactly fair, as the Intel CPU is designed to reach twice the A12's top speed despite its lagging manufacturing process. The longer pipeline required to operate at 5GHz depresses IPC but increases peak performance, albeit at higher power. In mobile devices, Skylake operates at much lower clock speeds yet still delivers competitive performance.

Many of the newest smartphone processors move to a three-tier CPU cluster. Instead of four identical big cores, they have one or two cores that deliver maximum performance plus a second tier of "middle" cores. This approach can reduce die area and power while optimizing for typical use cases that have only one or two heavy threads. These processors also have a set of small cores to efficiently handle low-performance tasks. The recent Exynos 9820, for example, features two custom M4 CPUs, two Cortex-A75s, and four Cortex-A55s. Qualcomm and Huawei use Cortex-A76 for both the high and middle tiers by creating two different physical designs.

Microprocessor Report subscribers can access the full article:

https://www.linleygroup.com/mpr/article.php?id=12151

Arm Mali-G77 Goes to Valhall

By Mike Demler

The Mali-G77 is the first GPU based on Arm's new Valhall architecture. Compared with a Bifrost-based Mali-G76 having the same number of shaders, operating at the same frequency, and manufactured in the same technology, it increases area and power efficiency by 30%. But the biggest improvement is a 60% increase for the general-purpose-GPU (GPGPU) operations in machine learning. The Valhall architecture supports multicore configurations with up to 32 shaders, but the G77 comes in 7- to 16-shader models.

Valhall replaces multiple execution engines in high-end Bifrost GPUs with a single 32-thread warp-based execution engine, which simplifies control and core interconnects. The new shader core also increases the texture-unit throughput from two to four pixels per cycle, and it employs a completely redesigned load/store cache. Despite the additional hardware, the Mali-G77 can deliver up to 40% more frames per second than a Mali-G76 in the same die area when running complex mobile games.

The new Mali-D77 DPU builds on the predecessor Mali-D71, adding features designed specifically for head-worn devices. The company optimized it to drive 3K-resolution (2,880x1,440) headset displays at 120fps, but it also supports 4K (4,320x2,160) at 90fps. For VR, the headset uses half the horizontal resolution for each eye. In standard displays that lack stereo images, the DPU can increase its 4K output to 120fps.

Microprocessor Report subscribers can access the full article:

https://www.linleygroup.com/mpr/article.php?id=12152

Achronix Debuts FPGAs for AI

By Tom R. Halfhill

As promised last year, Achronix is using its embedded-FPGA technology to build a new FPGA family optimized for machine learning and data throughput. To compete with Intel and Xilinx FPGAs in data centers, the new Speedster7t family employs faster DSP blocks, optional GDDR6 memory, 400 Gigabit Ethernet, PCI Express Gen5, 7nm FinFETs, and a custom on-chip network. Relative to competing products, its DSPs are more optimized for machine learning (ML) and its I/O interfaces are more geared to high bandwidth. One tradeoff, however, is that it lacks the Arm CPUs that other advanced FPGAs integrate for general embedded applications.

Differentiation is vital if Achronix is to avoid the chronic crashes that have thwarted other attempts to challenge the FPGA duopoly of Xilinx and Intel (Altera). One approach is to lay a custom interconnect over the conventional FPGA fabric, which is bounded by interfaces for high-speed networking and external memory. The on-chip network resembles a mesh but is actually a grid of Amba AXI buses. It's simpler than a true mesh and provides faster pathways between the chip's I/O interfaces and processing elements than the usual method of routing data through the configurable fabric.

To bring ASIC-like processing to this design, Achronix has also built new DSP blocks optimized for low-precision multiplication. Called machine-learning processors (MLPs), these hard-logic blocks are twice as powerful as those introduced last year in its SpeedCore7t embedded-FPGA intellectual property (eFPGA IP). With additional help from the fabric's multipliers, the top-end Speedster7t 6000 FPGA can execute up to 134 trillion operations per second (TOPS) on 8-bit-integer (INT8) data. Achronix says the initial Speedster7t design will tape out in June and sample in 4Q19.

Microprocessor Report subscribers can access the full article:

https://www.linleygroup.com/mpr/article.php?id=12153

About Linley Newsletter

Linley Newsletter is a free electronic newsletter that reports and analyzes advances in microprocessors, networking chips, and mobile-communications chips. It is published by The Linley Group. To subscribe, please visit:

http://www.linleygroup.com/newsletters/newsletter_subscribe.php

Domain: Electronics
Category: Semiconductors
SEMICONDUCTOR ANALYTICS

Recent Newsletters

Linley Newsletter: August 8, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #664 August 8, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry E

08 August, 2019

Linley Newsletter: August 1, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #663 August 1, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry E

01 August, 2019

Linley Newsletter: July 25, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #662 July 25, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry

25 July, 2019