Linley Newsletter: March 14, 2019

 weSRCH's Best of the Internet Award

Linley Newsletter

Please feel free to forward this to your colleagues

Issue #643

March 14, 2019

Independent Analysis of Microprocessors and the Semiconductor Industry

Editor: Tom R. Halfhill

Contributors: Linley Gwennap, Mike Demler, Bob Wheeler

In This Issue:

- Exynos 9820 Has Samsung AI Engine

- Cadence ConnX Sensors and Radios

- ST Debuts Its First Application SoCs

Exynos 9820 Has Samsung AI Engine

By Linley Gwennap

Samsung's new flagship Galaxy S10 smartphone introduces the company's next high-end processor, the Exynos 9820. The new chip combines Samsung and third-party cores to form a differentiated design. Among the in-house cores is a new deep-learning accelerator (DLA) that the company disclosed at the recent ISSCC. The DLA employs a dual-core design to achieve a peak rate of two trillion operations per second (TOPS). The ISSCC paper details unique features that can boost throughput by up to 3.5x on sparse neural networks.

At a high level, the DLA comprises two compute cores plus common control logic. After starting with a total of 1,024 multiply-accumulate (MAC) units, the designers split them into two cores to simplify routing and clock synchronization. Each core can independently compute convolution results at the peak rate, since the convolution layers require no intercore routing. Fully connected layers, as the name implies, require additional connections between the cores, but these layers occur less often and so can progress more slowly.

In addition to the new neural engine, the 9820 packs two custom M4 CPUs, two Cortex-A75s, and four Cortex-A55s in a three-level arrangement that combines the M4's excellent single-thread performance with the smaller die and greater power efficiency of the Cortex designs. The chip includes a 12-core Mali-G76 GPU and a Samsung modem that can hit 2.0Gbps for downloads and 316Mbps for uploads. Samsung manufactures the 9820 in its 8nm FinFET process.

Previous Exynos chips featured a DLA based on a design licensed from DeePhi, a Chinese vendor. Last year, however, Xilinx acquired DeePhi and refocused its efforts on FPGA-based designs. By that time, Samsung was already well along with its in-house effort. The company developed the new DLA in Korea, unlike its custom CPUs, which it develops in Texas.

Microprocessor Report subscribers can access the full article:

Cadence ConnX Sensors and Radios

By Mike Demler

Sensors in autonomous vehicles and 5G radios produce very different signals, but Cadence designed its new ConnX DSPs to handle both. It developed two models: the B20 has a 512-bit vector engine and the B10 has a 256-bit vector engine. These intellectual-property (IP) cores take over the top spot in the ConnX lineup, offering higher performance and precision than the ConnX BBE-series baseband engines Tensilica introduced in 2009.

Because the new designs are software compatible with their predecessors, customers can reuse code written for the older cores. Cadence is accepting requests for early access now, but it plans to release both DSPs for general availability in 2Q19.

To handle these complex signal-processing requirements, the new designs provide a complex set of configuration options. The company offers a high-precision option for both new DSPs, adding 32-bit fixed-point compute units to the 16-bit units in the base models. This option also doubles throughput for 16-bit operations. A vector FPU (VFPU) is optional, but it comes in three flavors: single-precision (FP32), single-precision extended (SPX) with twice as many FPUs, and half-precision extended (HPX or FP16). The communications-acceleration option includes specialized hardware for 4G and 5G modems. Designers can further customize the DSPs using the Tensilica Instruction Extension (TIE) language.

Before introducing these new DSPs, Cadence hadn't upgraded the ConnX lineup since its 2013 acquisition of Tensilica. Numerous IP vendors have produced deep-learning accelerators for advanced driver-assistance systems (ADASs), but the B20 and B10 fill a need for front-end processing of lidar and radar data. Designers will need to sort through the extensive options to optimize the DSPs for their applications, but the comprehensive Eclipse-based Xtensa Xplorer integrated-development environment (IDE) eases that task. The new B20 and B10 put the company back in the high-performance-DSP race.

Microprocessor Report subscribers can access the full article:

ST Debuts Its First Application SoCs

By Tom R. Halfhill

Relatively few embedded processors integrate an application CPU with a 3D GPU and real-time microcontroller core. Even fewer can operate in the subwatt range when running full tilt. STMicroelectronics is joining this exclusive club with its new STM32MP1 family, which extends the existing STM32 microcontrollers into the realm of full-fledged SoCs.

The superset design is the STM32MP157, which features two Arm Cortex-A7 CPUs for application software, a Cortex-M4F coprocessor for real-time control, a VeriSilicon 3D GPU, and numerous on-chip memories, peripherals, and I/O interfaces. The STM32MP153 drops the GPU and its display interface, and the STM32MP151 drops those features, one Cortex-A7, and two I/O ports. All began production in February. Even the top-end model typically consumes only 500mW.

ST is targeting general-purpose embedded systems that would otherwise employ a separate application processor for the high-level software and an MCU for real-time control. Some examples are factory machines, medical devices, and home appliances that have graphical user interfaces -- including some battery-powered products. The only markets excluded are automotive and aerospace.

The STM32MP157 is stuffed with features normally expected of a well-appointed 32-bit MCU, save one: it omits on-chip flash memory, despite being manufactured in flash-friendly 40nm technology. Designed for systems requiring more memory than most MCUs offer, it relies on external NAND or NOR flash for nonvolatile storage and external DRAM for working memory. But it does have four blocks of on-chip SRAM totaling 708KB, plus a 256KB L2 cache for the application CPUs and 3KB of one-time-programmable (OTP) memory for secure storage.

The STM32MP1 family competes with similar SoCs that integrate application processing and real-time control -- particularly, NXP's i.MX family and Texas Instruments' Sitara family. ST is quoting similar prices and power levels. The STM32MP1 brings fresh competition to this market segment.

Microprocessor Report subscribers can access the full article:

About Linley Newsletter

Linley Newsletter is a free electronic newsletter that reports and analyzes advances in microprocessors, networking chips, and mobile-communications chips. It is published by The Linley Group. To subscribe, please visit:

Domain: Electronics
Category: Semiconductors

Recent Newsletters

Linley Newsletter: August 8, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #664 August 8, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry E

08 August, 2019

Linley Newsletter: August 1, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #663 August 1, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry E

01 August, 2019

Linley Newsletter: July 25, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #662 July 25, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry

25 July, 2019