Linley Newsletter: June 7, 2018

 weSRCH's Best of the Internet Award

Linley Newsletter

(Formerly Processor Watch, Linley Wire, and Linley on Mobile)

Please feel free to forward this to your colleagues

Issue #603

June 7, 2018

Independent Analysis of Microprocessors and the Semiconductor Industry

Editor: Tom R. Halfhill

Contributors: Linley Gwennap, Mike Demler, Bob Wheeler

In This Issue:

- Cortex-A76 Revamps Core Design

- Mali 76-Series Boosts Flagship Phones

- TidalScale Inverts Server Virtualization

Cortex-A76 Revamps Core Design

By Linley Gwennap

Maintaining its annual cadence of high-end mobile CPUs, Arm's new Cortex-A76 delivers a big performance jump over Cortex-A75 using a completely revamped microarchitecture. The company expects the new design to increase per-clock performance (IPC) by 25%, which translates to a 35% gain when comparing this year's 10nm A75 to next year's 7nm A76. Production RTL shipped late last year, and the first smartphones using the new core could appear early in 2019. The first processor to use the new core will likely be Qualcomm's next-generation Snapdragon 855.

If the A76 meets its goals, the new CPU will provide the biggest performance gain for smartphones in four years. Arm has consistently delivered a new high-end mobile CPU on an annual cadence, with the previous three generations providing IPC gains averaging 11%; smartphone users have also benefitted from IC-process improvements that pushed up the annual performance gain to 23%. The A76 well exceeds this pace, marking the biggest jump since Arm moved from the 32-bit Cortex-A15 to the 64-bit Cortex-A57, a transition that required two years. The new CPU has a larger transistor count than the A75, but it still improves in power and area efficiency.

The A76's performance gains come from a variety of microarchitecture enhancements over the previous A75. The most significant is a fourth decoder, raising the peak throughput to four instructions per cycle. Compared with the A75, the new design additionally features better branch prediction, a larger instruction-reorder window, a third integer ALU, faster floating-point operations, and a shorter mispredicted-branch penalty. Arm also doubled memory bandwidth, reduced cache latencies, improved the prefetcher, and reduced the number of TLB misses. Taken together, these changes bring the core's IPC close to that of Intel's Skylake design, making it suitable for Windows-based laptops as well as premium smartphones.

Microprocessor Report subscribers can access the full article:

Mali 76-Series Boosts Flagship Phones

By Mike Demler

Arm has strengthened its hold on the premium-smartphone-processor market by adding two new products to its high-end Mali graphics and multimedia lineup. The Mali-G76 GPU and Mali-V76 video-processor (VPU) cores target SoCs for flagship-phone chips, virtual-reality (VR) headsets, and next-generation 8K TVs. They join the company's new top-tier "76" series, which aligns product numbering with the concurrently announced Cortex-A76.

As the new head of the GPU lineup, the Mali-G76 provides 30% better area efficiency than the previous Mali-G72. It increases performance by doubling the execution-engine width from four lanes to eight, carrying forward the same lane-widening scheme the company recently introduced in the midrange Mali-G52. The wider execution units enable the G76 to process workloads in less time than the G72, which (accounting for control overhead) yields 30% average energy savings. Designers can use those savings to extend battery life or deliver a 30% performance boost at the same power.

Joining Arm's 76ers family is the Mali-V76. This new VPU supports 60fps decoding of 8K-resolution video, and it can encode that next-generation format at 30fps. Although TV manufacturers have introduced high-end 8K models, little content is available at that resolution, so 8K TVs mostly just scale up 4K video. The V76's 8K capability is also useful for sending separate left/right 4K images to VR headsets and for streaming multiple images to video walls.

Microprocessor Report subscribers can access the full article:

TidalScale Inverts Server Virtualization

By Bob Wheeler

We seldom write about software companies, but TidalScale offers a software-based alternative to the proprietary hardware traditionally employed in scale-up servers. First, forget everything you know about virtual machines (VMs). Whereas traditional server virtualization divides one physical server into many small VMs, TidalScale enables one massive virtual machine running across many physical servers. Its approach creates a virtual scale-up server running a single operating-system instance on hardware composed of commodity x86 servers. This approach benefits applications requiring more memory than a single commodity server provides, such as in-memory databases, analytics, and scientific computing.

Other large-scale nonuniform memory architectures (NUMAs) rely on high-speed low-latency interconnects to minimize the cost of remote-memory access. TidalScale's innovation is its use of machine learning to dynamically characterize resource consumption and then redistribute resources to minimize communications between physical nodes. In many cases, its HyperKernel moves a process to a new node rather than moving memory contents to the original node.

The distributed HyperKernels create a single memory image, with each node's physical memory operating as a level-four (L4) cache. More than a decade ago, AMD and Intel enabled hierarchical virtual-memory structures to handle hypervisor-based virtualization. Those same extensions make TidalScale's HyperKernel completely transparent to the guest operating system (OS). In fact, the HyperKernel doesn't distinguish between OSs and applications, neither of which require modification.

Many innovative processor and system architectures are solutions in search of a problem. TidalScale's approach, however, appears well suited to in-memory computing, which has exploded thanks to big data. Scale-out software, such as Hadoop, introduces performance-robbing overhead and can require time-consuming data-set partitioning. By turning virtualization on its head, TidalScale expands memory in a new way, obviating the need for physical scale-up servers.

Microprocessor Report subscribers can access the full article:

About Linley Newsletter

Linley Newsletter is a free electronic newsletter that reports and analyzes advances in microprocessors, networking chips, and mobile-communications chips. It is published by The Linley Group and consolidates our previous electronic newsletters: Processor Watch, Linley Wire, and Linley on Mobile. To subscribe, please visit:

Our privacy policy has been updated to incorporate GDPR considerations:

Domain: Electronics
Category: Semiconductors

Recent Newsletters

Linley Newsletter: August 8, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #664 August 8, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry E

08 August, 2019

Linley Newsletter: August 1, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #663 August 1, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry E

01 August, 2019

Linley Newsletter: July 25, 2019

Linley Newsletter Please feel free to forward this to your colleagues Issue #662 July 25, 2019 Independent Analysis of Microprocessors and the Semiconductor Industry

25 July, 2019