Linley Newsletter: October 18, 2018

 weSRCH's Best of the Internet Award

Linley Newsletter

Please feel free to forward this to your colleagues

Issue #622

October 18, 2018

Independent Analysis of Microprocessors and the Semiconductor Industry

Editor: Tom R. Halfhill

Contributors: Linley Gwennap, Mike Demler, Bob Wheeler

In This Issue:

- Turing T4 Targets AI Inference

- Cadence Mutates Its DNA to Boost AI

- Titan IC Floats 100Gbps Reg-Ex Engine

Turing T4 Targets AI Inference

By Linley Gwennap

Nvidia wants to take over the AI inference market, and its newest weapon is the Turing architecture. Whereas its predecessor, Volta, focuses on fast matrix multiplication for floating-point values, Turing adds support for more efficient integer data types that have become common in neural-network inference. As a result, the new architecture doubles Volta's per-core throughput at roughly the same power.

Rather than unleashing a full-blown Turing design to replace the high-end V100 board, the company deployed the new architecture only in smaller configurations that mainly target PC graphics (for both gamers and professionals). The V100, which just entered production late last year, remains Nvidia's primary data-center offering for neural-network training. For inference, the company began shipping a Tesla T4 card that uses the TU104 die and, at 70W, fits into a standard PCIe power envelope.

Most cloud-service providers run the majority of their AI inferencing on Intel Xeon processors. This approach treats inference like any other workload, allowing them to perform the task on any number of servers as needed to handle demand fluctuations. But the strong performance of the Tesla T4 should start to change this approach. The accelerator fits into a standard server but delivers 10x more ResNet-50 performance than a high-end Xeon Gold processor. It will continue to have a strong lead even when Intel's Cascade Lake becomes broadly available. Nvidia also offers a complete software stack, such as the TensorRT tool that converts trained neural networks from FP to integer weights and optimizes them.

Microprocessor Report subscribers can access the full article:

Titan IC Floats 100Gbps Reg-Ex Engine

By Bob Wheeler

Let's start with the elephant in the room. Who names their company after a ship that sank on its first voyage? The answer is a university spinoff headquartered next to the Belfast, U.K., shipyard where the RMS Titanic was built. The CTO of Titan IC, Sakir Sezer, remains a professor at Queen's University Belfast, the incubator for the company's technology.

Since hiring its first employee in 2012, the team has grown to 27 people, backed by a combination of several million pounds in government grants and private funding. Titan's product is a regular-expression processor (RXP) that scales to 100Gbps of throughput. The company licenses it as intellectual property (IP) for use in SoCs, ASICs, and FPGAs.

Network security is the primary application for high-performance regular-expression (reg-ex) processing. Intrusion-detection systems and next-generation firewalls use reg-ex engines to scan packets for patterns that indicate an attack or malware. Most security appliances implement reg-ex searches in software, consuming many CPU cycles and limiting throughput under some conditions. Still, software approaches maximize flexibility, and the OEM can optimize its reg-ex engine for its own rule-set structure. A few OEMs design their own hardware engines, which are implemented in ASICs or FPGAs.

Several embedded-processor vendors -- including Broadcom, Cavium, and NXP -- implemented their own hardware reg-ex engines and integrated them into SoCs. Eventually, however, most abandoned these internal efforts. Their reasons included performance limitations as well as customer difficulty in mapping rule sets to the peculiarities of a given engine design. Titan addresses these concerns with a combination of greater performance and configurations tuned to the customer's application.

Microprocessor Report subscribers can access the full article:

Cadence Mutates Its DNA to Boost AI

By Mike Demler

Cadence's new DNA 100 sheds its predecessor's reliance on fully programmable architectures, integrating purpose-built neural-network hardware. The licensable core employs a scalable compute engine that supports configurations ranging from 256 multiply-accumulators (MACs) to 4,096 MACs. In a 16nm process, the design runs at 1.0GHz, delivering up to four trillion MACs per second (MAC/s). The company plans to make the intellectual property (IP) available to lead customers in December and to offer it for general licensing in 1Q19.

In addition to streamlining the execution pipeline, the DNA 100 reduces storage requirements and further increases processing efficiency by compressing activation/feature maps and convolution-weight parameters. The DMA block detects the zero-valued weights in filter matrices, storing only nonzero values in the coefficient memory. It also discards zeroes from the input data, as well as those produced by the activation functions. Furthermore, neural networks frequently reuse weights, and the compiler identifies them to eliminate redundant storage.

The compressed feature maps and coefficient arrays work with the sparse-compute engine to reduce MAC operations to only those producing nonzero results. Tensors fetched from memory include flag bits, which indicate matrix operations the engine can skip. According to the company's simulations, this approach effectively doubles the number of productive MAC operations per cycle for a neural network with 50% activation sparsity and 15% weight sparsity.

But even if it falls short of the company's lofty projections, the core's new compression and sparsity features eliminate power wasted on useless operations, and it delivers greater per-cycle throughput than less sophisticated designs. It also retains Tensilica's traditional customization capabilities. By altering its DNA to combine programmability with dedicated hardware, Cadence has produced a deep-learning accelerator that can handle the most popular CNNs and RNNs in addition to supporting next-generation algorithms.

Microprocessor Report subscribers can access the full article:

About Linley Newsletter

Linley Newsletter is a free electronic newsletter that reports and analyzes advances in microprocessors, networking chips, and mobile-communications chips. It is published by The Linley Group. To subscribe, please visit:

Domain: Electronics
Category: Semiconductors

Recent Newsletters

Linley Newsletter: August 8, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #664 August 8, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry E

08 August, 2019

Linley Newsletter: August 1, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #663 August 1, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry E

01 August, 2019

Linley Newsletter: July 25, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #662 July 25, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry

25 July, 2019