The Future Of Graphic And Mobile Memory For New Applications

The Future Of Graphic And Mobile Memory For New Applications

Loading
Loading Social Plug-ins...
Language: English
Save to myLibrary Download PDF
Go to Page # Page of 25

Description: This presentation is intended to provide information concerning memory industry. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors.

As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this presentation. Samsung reserves the right to make improvements, corrections and/or changes to this presentation at any time.

 
Author: Jin Kim  | Visits: 351 | Page Views: 483
Domain:  High Tech Category: Semiconductors 
Upload Date:
Link Back:
Short URL: https://www.wesrch.com/electronics/pdfEL1SE1000CYOO
Loading
Loading...



px *        px *

* Default width and height in pixels. Change it to your required dimensions.

 
Contents:
The future of
graphic and mobile memory
for new applications
August 21st, 2016 l JIN KIM l Samsung Electronics

Disclaimer
This presentation is intended to provide information concerning memory industry. We do our best to make
sure that information presented is accurate and fully up-to-date. However, the presentation may be subject
to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence,
Samsung does not in any way guarantee the accuracy or completeness of information provided on this
presentation. Samsung reserves the right to make improvements, corrections and/or changes to this
presentation at any time.
The information in this presentation or accompanying oral statements may include forward-looking
statements. These forward-looking statements include all matters that are not historical facts, statements
regarding the Samsung Electronics' intentions, beliefs or current expectations concerning, among other
things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature,
forward-looking statements involve risks and uncertainties, because they relate to events and depend on
circumstances that may or may not occur in the future. Samsung cautions you that forward looking
statements are not guarantees of future performance and that the actual developments of Samsung, the
market, or industry in which Samsung operates may differ materially from those made or suggested by the
forward-looking statements contained in this presentation or in the accompanying oral statements. In
addition, even if the information contained herein or the oral statements are shown to be accurate, those
developments may not be indicative developments in future periods.
2/24

Contents





Memory technology trend
High speed graphic technology ( >10Gbps)
Low power mobile technology ( >20%)
Conclusion

3/24

Memory technology trend

4/24

Memory is at the core of new applications
Higher Performance

1

256GB/s

x10
Bandwidth

Lower Power

Autonomous

Artificial
Intelligence

Memory
-Centric
Computing

Virtual
Reality

0.7

x0.5 Power
Efficiency
0.5

30GB/s

GDDR5

HBM2

Computer
Vision

LP3

LP4

LP4X

Source: Samsung

5/24

Memory-centric system evolution

A.I., VR/MR, Vision

DDR5/LP5/GDDR6
lower power
noise immune
high speed

Multi-Core
Core Clock

Memory Wall

Efficiency (Perform./Power, Cost)

• Extreme B/W, performance/power, data processing, cost effective solutions

Value, UX
Perform extension

Memory
Evolution

Data Traffic,
Cost, Thermal

off-loading
customized
processing

Low Cost HBM/PIM
PC/Server, Mobile, Gfx

SoC

Time

6/24

Memory technology trend
• GDDR6 with over 14Gbps, beyond 10Gbps GDDR5
• LP5, 20% more power-efficient than LP4X

15

Power Efficiency
[mW/GBps]

Performance
[Gbps/pin]

100%

DDR3

GDDR6
80%

12

LP3

9

DDR4

GDDR5

60%
LP4

LP5
6

GDDR5
DDR5

40%
LP4X

LP4X

LP4
3

GDDR6

20%

DDR5

LP5

DDR4
2016

2018

2016

2020

7/24

2018
Source: ISCA2016, Samsung

2020

High Bandwidth Memory: HBM
1TB/s

Benefits

High Bandwidth

8H stacked 20nm 8GB HBM

Performance

TSV Technology
HBM

X 2.7

GDDR5
HBM

Microbump
DRAM

Logic Processor

Buffer

Si Interposer

Power Efficiency

PCB

1,024 I/O Architecture

HBM

X 0.8

GDDR5

Source: Samsung

8/24

Processing In Memory: PIM
• Fill the performance gap and deliver energy-efficient solutions

Processing In-Memory
Better parallelism and lower bus traffic
GPU/VPU

AP
DRAM
CPU
Processing In Buffer

Processing In DRAM

Memory off-loading for lower frequency and power

9/24

Source: Samsung

High speed graphic technology
( >10Gbps)




Graphic application requirement
Asymmetric System, Crosstalk, EQ tuning
GDDR6, Low cost HBM, PIM

10/24

High speed memory requirement
• For 4K real infographic virtual reality, 13.2GB, 1TB/s memory needed
• For 4K 3D mixed reality, +3.5GB, 151GB/s memory needed
Gaming Virtual Reality memory

Mixed Reality memory

23.6

11.6

[ Added Capacity, GB ]

[ Gfx Capacity, GB ]
Main

13.2

H/E
6

13

Main

2.7

4K UHD

8K UHD

3,216

3.5

1.6

1.0
QHD

H/E

8

2

9.0

QHD

4K UHD

8K UHD

3,640

[ B/W, GB/s ]

791

[ B/W, GB/s ]
527

1064
90

215
QHD

462

42

28
4K UHD

8K UHD

QHD

11/24

101

151

4K UHD

8K UHD

Variable Assumption
Poly count, fps, # of texture per fragment, cache hit rate, tri-linear filtered,
# of virtual light source, Reflection/refraction ratio, ray bounce depth

Source: Samsung

Asymmetric system for higher data rate
• Focus on the respectively dedicated features to maximize data rate



Smart GPU : Training (Per-bit Timing/EQ) for minimizing static offset/noise
Noise immune DRAM : minimizing dynamic noise (Jitter, ISI/x-talk, clock duty/skew)

Training(Timing/EQ)
Board/PKG SI/PI CMD/AMD

CA[0:9]
D Q

D Q

DRAM
Core

CK_t
CK_c
PLL/DLL
Data Tx/Rx

D

WCK_t
WCK_c
Phase
Detector

Clock Phase
controller

DQ

D Q

Q

D Q

CTLE

GPU

Jitter
ISI
X-talk
To EDC pin

DQ[0:7]

Calibration data

Noise immune
circuit/PKG

D Q

DRAM
Core

DRAM
Source: Samsung

12/24

X-talk reduction for Board/PKG design
• Small X-talk Package : reduction of X-talk with better return path
• Crosstalk Reduction with coding : 3B4B, 8B9B
Small X-talk PKG requirement

3B4B encoding

GDDR5

Crosstalk Reduction
ICR: Insertion loss to Crosstalk Ratio
Source: Samsung

13/24

DFE for return-loss reduction on system
• Single ended signaling requires noise immune equalizer


DFE* is more suitable than CTLE**
CTLE & DFE

RX

EQ

Quarter rate DFE with summer in sampler

FIFO

8GHz
WCK/WCKB
4
DQ

CLK buffer

TX

MUX

4

/2

4GHz

4

FIFO

CTLE and DFE
Periodically Calibrated by GPU

Adopt merged summer/sampler for fast feedback
* Decision Feedback Equalization
** Continuous Time Linear Equalization

14/24

Source: Samsung

GDDR6 ideas
• High Speed Signaling, 14Gbps ~ 16Gbps, 1.35V



Low jitter clocking with WCK/byte, Per-bit RX/TX equalizer training, X-talk reduction
2 channel with BL16, same Clock/ADD freq., twice of WCK/DQ freq.
WCK Clocking

Target Timing
RD

WR
CK : 1.75Gbps

GPU

7GHz

DRAM

CMD : 1.75Gbps

~8GHz

WCK

WCK tree

GDDR5

ADDR : 3.5Gbps
WCK : 3.5Gbps

Word  Byte
DQ

DQ : 7Gbps

TX
CK : 1.75Gbps

14Gbps
~16Gbps

RX

GDDR6

GPLL

CA : 3.5Gbps
WCK : 7Gbps

Noise immune DRAM
DQ : 14~Gbps

Source: Samsung

15/24

Low cost HBM for consumer segment
• ~ 200GBps with smaller # of TSV compared to HBM2



Cost competitiveness ; remove buffer die, reduce # of TSV, organic interposer, etc..
Need inputs from Client segment for specific features
Challenge for HBM
HBM

Comparison

DRAM

HBM2
I/O

1024

~512

2Gbps

3Gbps ~

BW (GB/s)

256

~ 200

Cost/GB

1

0.X

3 4
2

Buffer
1

Si Interposer

Low cost HBM

Pin speed

Logic Processor

5

PCB

Challenges
1.
2.
3.
4.
5.

IO reduction, Smaller # of TSV
Remove buffer die
Master/Slave structure
Remove ECC
Si or organic Interposer

Source: Samsung

16/24

PIM, Deep Learning in DRAM
• Parallel processing in buffer to reduce extreme-bandwidth


convolution, subsampling, matrix calculation

• Collaborate with accelerator for performance/cost
Extreme B/W Requirement
CPU

Processing in Buffer
Data movement reduction

GPGPU

CPU
+
GPU+HBM/GDDRx

CPU

HBM/GDDRx

Accelerator

Accelerator

X10

X10

(# of core) (# of core)
Mem

Accelerator

Mem

DRAM

xHBM xHBM

CPU
+
Acc.s+xHBMs*

DRAM

DRAM

xHBM xHBM

* xHBM: Extreme HBM

DRAM

Deep Learning
In Buffer

17/24

Convolution / Subsampling

Low power mobile technology
( >20%)




Motivation for low power mobile
LP4X / LP5
PIM

18/24

Motivation for low power mobile
• PC-level graphic performance and mobile power budget
• Power is continuously increasing with limited thermal budget
Performance vs. TDP

Power Dissipation Trend
Desktop

Dynamic Power

Power Gap

10

Static Power

Power Dissipation [W]

5K
GFLOPS (GPU)

Notebook
4K

3K

PC Graphic
Performance

Oculus Rift
(+GTX Card)

1

Thermal Limit
(hand-held device)

Lower Power
design

10-1

2K

1K
TDP [Watt]

Mobile

0

100

[Year]

10-2
‘00

200
300
*TDP(Thermal Design Power)

‘05

‘10

‘15

‘20
Source: Samsung

19/24

Lower power solution, LP4X
• LP4X : 4266Mbps, VDDQ/VDD = 0.6V/1.1V


IO power reduction with 0.6V VDDQ, Good example of small change but big gain
LP4X Power Reduction

LP4X Idea
1.1V

VOH =VDDQ/3
VREF =VOH/2
VO

MNUP

-45%

GND

VO0

CHANNEL

VOH

Same Swing
Same VOH
Half-level VDDQ

Rterm
MNDW

VDDQ (=1.0V)

VDDQL (=0.6V)

from AP
0.6V

VOH =VDDQ/2
VREF =VOH/2

Pre-driver

VO0

Core

1-UI

VO

MNUP

DQ

LP4

IO

Pre-driver

DQ

18% Total Power Saving!!!!

1-UI

VDDQ (=1.1V)

CHANNEL

LP4X

GND

VO
Rterm

MNDW

LP4 3200

LP4 3733

•Conditions : IDD4R(VDDQ+VDD2) Spec Value / 50% Data change each burst transfer / Included process node contribution

20/24

LP4 4266

LP4X 4266

Source: Samsung

LP5 target & ideas
• LP5 : 6400Mbps, VDDQ/VDD < 0.6V/1.1V


Extremely high band-width(~6.4Gbps) and smart power reduction(~20%)
LP5 ideas

Power Efficiency Trend
* Pin Speed
- LP2 : 800Mbps ~ 1066
- LP3 : 1600Mbps ~ 1866
- LP4 : 3200Mbps ~ 3733
- LP4X : 4266Mbps

[mW/Gbps]

LPDDR5 :

CMD Based
Data CLK(WCK)


CK
(Single-Ended)
WCK

LPDDR4 : CK/CK

DQ

BL0 BL1

BL2 BL11 BL12 BL13 BL14 BL15

DQS_C/T
DQ

BL0 BL1

BL2 BL11 BL12 BL13 BL14 BL15

Over 50%
IDD2N Reduction

IDD2N reduction

WCK
Center-tap term

35%



39%

IDD4W/R reduction

Over 5%
IDD4W/4R Reduction

18%

Deep
Sleep Mode

20%
LP2

LP3

LP4

LP4X

LP5



IDD6 reduction

Over 30%
IDD6 Reduction
Source: Samsung

21/24

PIM, Lower power processing
• Memory off-loading for reduced power consumption


Reduce the unnecessary data transfer and frame rate control

• Collaborate with SoC/AP for performance/power


PoC with special memory for post/pre-processing
Memory B/W Traffic
Display

CIS

Memory Off-loading Solution
CIS

AP

AP

VPU

Display
AMBA AHB

Pre/Post Processing In Memory
Severe
Data Traffic

Recognition Distortion
Limited Power Budget

22/24

Correction

FRC

Conclusion

23/24

Conclusion
• Memory requirements have become more strict in time with respect to
performance, power, and cost
• Keeps innovating technology to correspond to those requirements
‒ Make efforts to extend the value of traditional memory
‒ Figure out innovative memory solution
• Close collaboration with partners is essential for delivering the right
memory solution.

kjh5555@samsung.com
24/24