Silicon Photonics in Post Moore’s Law Era:
Technological and Architectural Implications
Ke Wen*, Sébastien Rumley, Payman Samadi, Christine P. Chen, Keren Bergman
Department of Electrical Engineering, Columbia University, New York, *email@example.com
1 A RIPPLE EFFECT FROM CACHE REDUCTION
oore’s law is ending as we enter the last years of
shrinking transistors. Chip designers will thus have
to use the available transistors more effectively. One may
interpret a few already signs of this trend. As shown in
Figure 1, area devoted to cache on a CPU chip is decreasing, both in terms of (a) MB per FLOPS and (b) normalized chip area -- cache size (MB) × features size (nm2) /
die size (mm2). Especially, a sharp fall (Fig. 1a) is clear as
the industry gets into the many-core era (around 2013).
Interestingly, this cache cliff matches the time when
Moore’s Law was said to be dead in the economic sense -starting from 2013, the number of transistors bought per
dollar has stayed stagnant . The fact that chipmakers
are willingly trading the cache area for more FLOPS,
along with the rise of data-centric throughput computing
, calls for significantly higher off-chip memory bandwidth. Fig. 1c shows this trend: the sharp increase of the
off-chip memory bandwidth matches the cache cliff of
Fig. 1a. This increase, however, is still not enough to balance the FLOPS increase as the bytes per flop ratio continues to drift away from the ideal point.
There is more to this grim description. The memory
bandwidth increase is also rapidly stressing the pin count
limit of the processor package. For example, KNL requires 3647 pins in the socket, plus 1024 pins in the interposer for each of the eight on-package memory stacks.
The pin density of standard chip package, however, cannot scale indefinitely. The ever-increasing bandwidth
demand thus requires a more efficient chip I/O technoloCache Size (MB) Per GFLOPS
Xeon X5670 (Tianhe)
AMD Opteron 6274 (Titan)
Intel Xeon E5-2692 v2 (Tianhe-2)
KNC/Intel Xeon Phi
SW26010 (Sunway Taihu Light)
Photoreceivers (at front/rear ends)
Intel Xeon E5-2692 v2 (Tianhe-2) 0.2
KNC/Intel Xeon Phi
2 modulators 4
AMD Opteron 6274 (Titan)
IBM BQC (Sequoia)
byte per flop
Xeon X5670 (Tianhe)
6 HBM cubes
Memory Bandwidth Per GFLOPS (Bytes/Flop)
Oﬀ-Chip Memory Bandwidth (GB/s)
cache size (MB) × feature size (nm2) / die area (mm2)
IBM BQC (Sequoia)
Compatible with CMOS lithography fabrication, Silicon
photonics (SiP) has become one of the leading solutions to
the aforementioned chip I/O issue. An example of “extending the power of silicon to new arenas” , SiP leverages the transparency of silicon to light with 1.2~5 "m
wavelength for high-speed transmission. Each SiP waveguide can support terabit/s bandwidth, orders of magnitude higher than what can be achieved with conventional
electrical I/O. For example, while an 8-channel (4-layer)
High Bandwidth Memory (HBM) cube requires a 1024-bit
bus for 100 GB/s, a single SiP waveguide can provide the
same bandwidth with 32 wavelengths each at 25 Gb/s.
Silicon photonic is compatible with silicon interposers used to carry processor and memory chips, forming a
high-bandwidth chip-to-chip interconnect on package
(Fig. 1d). Components such as waveguides, modulators,
photodetectors and switches can be directly fabricated on
the silicon interposer with low cost. The SiP switch, controllable by the processor, can provide flexible and transparent connection between any memory stack and any
processor interface. A SiP interposer fabricated by PECST
of Japan was reported to achieve bandwidth density of
6.6 Tb/s/cm2 .
Another important aspect of SiP is extending highbandwidth I/O off package, enabled by efficient coupling
2 SILICON PHOTONICS FOR CHIP I/O
Normalized Cache Area
MB Per GFLOPS
gy for processors beyond the Moore’s Law.
One of many Flexfly instances!
Insertion of multiple
low-radix SiP switches
Fig. 1. (a) Cache size normalized by GFLOPS; (b) Normalized cache are; (c) off-chip bandwidth; (d) SiP interposer based architecture; (e)
SiP-enabled high-capacity HBM-based node; (f) alleviating hotspot; (g) optimizing core-memory affinity; (h) Flexfly network.
between waveguides and fibers. This is a much-needed
capability, as the interposer area (about 700 mm2) will
limit the capacity of on-package memory (OPM). The current solution is to pair the fast OPM with slow, offpackage, DRAM. Such small-fast, large-slow exclusiveness may significantly complicate application programming and memory management. The distanceindependent transmission of photonics can solve this
problem, enabling a uniform, high-capacity HBM architecture as shown in Fig. 1e. With 1 Tb/s bandwidth per
fiber, four fibers can supply the 256 GB/s bandwidth
needed by a HBM2 cube. With 24 fibers per coupling assembly and four such assemblies, an interposer hosting
processors can connect to a total of 24 HBM cubes, accounting for 192 GB memory capacity and 6 TB/s aggregate bandwidth. SiP technologies can thus enable a flat,
easy-to-manage memory hierarchy.
3 ARCHITECTURAL IMPLICATIONS
3.1 Node Level: Optimizing Memory Locality
The benefit of silicon photonics is not limited to sheer
bandwidth growth. As mentioned earlier, the reconfigurable SiP switch can provide connection between any
memory cube and any processor interface. This functionality can help precisely deliver memory data to the consumer cores, without traversing the network on chip
(NoC), effectively mitigating the NUMA problem faced
by the many-core era  . As shown in Fig. 1f, a reconfiguration of the SiP switch can reduce the NoC hop
count from 10 (dashed yellow, as in native connection) to
1 (solid yellow). This hop decrease immediately translates
into a few tens of nanoseconds less latency and a significant drop in energy dissipation. The routing of highspeed memory data out of the NoC plane may also save
the NoC bandwidth for more core-to-core communication, a trend as “MPI everywhere” (assigning each core
with a MPI process) emerges . SiP waveguides with
ultra-low loss of 1.2 dB/m has been demonstrated ,
meaning nearly distance-independent energy consumption for chip scale, as compared to 25 pJ per 64-bits per
mm in case of moving data electrically on chip .
Another possibility is to use the SiP switch to alleviate the hotspot effect on the NoC when hotspot memory
access happens (Fig. 1g). In this scenario, the SiP switch
can TDM select the memory interface to inject data stream
from the hotspot memory, thus distributing the traffic to
different NoC sections .
3.2 System Level: Flexible Topology
SiP switching can be also utilized to form flexible systemlevel topology . The need for flexible topology roots
from the diverse spectrum of applications that run in a
supercomputer. The clear difference in their communication characteristics, in terms of neighboring relationship,
traffic volume, etc, makes it very difficult to find a “bestfor-all” topology. SiP switching, in contrast, is capable of
dynamically “rewiring” the connections among a set of
electronic endpoints. These electronic endpoints can be
either compute nodes or electrical routers. The benefit is
directing bandwidth to where it is needed without overprovisioning it . Recently, a reconfigurable Dragonfly
architecture utilizing small-radix SiP switches has been
demonstrated . The architecture, called Flexfly, is capable of concentrating the fully-connected group-togroup links of Dragonfly into, for example, a thick ringlike topology (Fig. 1h). It is shown to help applications
like GTC to achieve 1.8x speedup over conventional
adaptive UGAL routing.
4 PHOTONIC-ELECTRONIC INTEGRATION
There are three methods for integrating SiP and electronic
devices: front-end, back-end and hybrid integrations.
In front-end integration, electronic and photonic devices are formed on the same layer. The advantage is that
nanophotonics can piggyback on the mask. However, the
challenge remains to guide light with sufficient isolation,
especially, separation between the waveguide and the
silicon substrate. While CMOS-SOI uses a thin buried
oxide (BOX) of 200 nm, photonics SOI requires a BOX of 1
"m. Approaches that utilize thicker-BOX have been proposed [12, 13]. This may, however, reduce the heat dissipation capability of electronics . Methods that do not
modify the standard CMOS have thus been proposed , which locally remove the underlying Si substrate to
Back-end integration is another monolithic method
[18, 19]. It allows deposition of sufficient isolation oxide
on top of the existing CMOS-SOI. However, this method
introduces additional back-end steps and thus extra cost.
The back-end method may also face a stricter thermal
budget in order to prevent damage to electronic CMOS.
As a result, engineers have to look at using other materials for the photonic layer. Yet, to date, silicon nitride ,
amorphous silicon  and laser-annealed polysilicon
 have been proven as feasible material.
The hybrid integration method forms photonic and
electronic circuits on separate chips and bonds them
through flip-chip bonding. As such, the photonic and
electronic chips can be each optimized using different
process flows. Hybrid integration is to date the majority
choice of SiP research and development parties. Signaling
speeds of 25 Gb/s  and 50 Gb/s  have recently been
demonstrated using flip-chip bonding.
The end of Moore’s Law comes at a time when efficient
allocation of transistor real estate has become imperative
for computing. The resulting cache reduction and the rise
of data-centric throughput computing calls for efficient
off-chip, off-package data movement. Silicon photonics
could potentially be one of the promising solutions the
computing world is looking for to continue performance
growth. Yet, industry-level electronic-photonic integration, and system co-design are yet to be realized, along
with reducing manufacturing costs. New architectural
implications of silicon photonics at both node level and
system level also require further investigation into how to
enable new dimensions of performance improvement.
photonics into electronic processes. in Spie opto. 2013. International
This work was supported by the U.S. Department of
Energy Lawrence Berkeley National Laboratory under
subcontract 7257488 and Sandia National Laboratories
under contract PO 1319001.
Society for Optics and Photonics.
15. M. Georgas, B.R. Moss, C. Sun, J. Shainline, J.S. Orcutt, M. Wade,
Y.H. Chen, K. Nammari, J.C. Leu, and A. Srinivasan. A
monolithically-integrated optical transmitter and receiver in a zerochange 45nm SOI process. in 2014 Symposium on VLSI Circuits
Digest of Technical Papers. 2014. IEEE.
After Moore's Law. The Economist Technology Quarterly 2016;
W.J. Dally. The end of denial architecture and the rise of throughput
computing. in Keynote speech at Desgin Automation Conference.
Intel. Expanding Moore's Law, Fall 2002 Update. 2002; Available
- Expanding Moore's Law.pdf.
Y. Arakawa, T. Nakamura, Y. Urino, and T. Fujita, Silicon
photonics for next generation system integration platform. IEEE
Communications Magazine, 2013. 51(3): p. 72-77.
D. Unat, T. Nguyen, W. Zhang, M.N. Farooqi, B. Bastem, G.
Michelogiannakis, A. Almgren, and J. Shalf. TiDA: High-Level
Programming Abstractions for Data Locality Management. in
International Conference on High Performance Computing. 2016.
K. Wen, H. Guan, D.M. Calhoun, D. Donofrio, J. Shalf, and K.
Bergman, Reconfigurable Silicon Photonic Memory Interconnect in
the Many-Core Era, in IEEE High Performance Extreme Computing
Conference (HPEC). 2016: Waltham, MA.
W. Gropp. MPI at Exascale: Challenges for Data Structures and
Algorithms. in European Parallel Virtual Machine/Message Passing
Interface Users’ Group Meeting. 2009. Springer.
J.F. Bauters, M.L. Davenport, M.J.R. Heck, J.K. Doylend, A.
Chen, A.W. Fang, and J.E. Bowers, Silicon on ultra-low-loss
waveguide photonic integration platform. Optics express, 2013. 21(1):
S. Rumley, D. Nikolova, R. Hendry, Q. Li, D. Calhoun, and K.
Bergman, Silicon photonics for exascale systems. Journal of
Lightwave Technology, 2015. 33(3): p. 547-562.
10. K. Wen, D. Calhoun, S. Rumley, X. Zhu, Y. Liu, L.W. Luo, R.
Ding, T.B. Jones, M. Hochberg, and M. Lipson. Reuse distance
based circuit replacement in silicon photonic interconnection networks
for HPC. in 2014 IEEE 22nd Annual Symposium on HighPerformance Interconnects. 2014. IEEE.
11. K. Wen, P. Samadi, S. Rumley, C.P. Chen, Y. Shen, M. Bahadori,
J. Wilke, and K. Bergman, Flexfly: Enabling a Reconfigurable
Dragonfly Through Silicon Photonics, in The International Conference
for High Performance Computing, Networking, Storage and Analysis
(SC). 2016: Salt Lake City, Utah.
12. S.K. Selvaraja, P. Jaenen, W. Bogaerts, D. Van Thourhout, P.
Dumon, and R. Baets, Fabrication of photonic wire and crystal
circuits in silicon-on-insulator using 193-nm optical lithography.
Journal of Lightwave Technology, 2009. 27(18): p. 4076-4083.
13. Y. Vlasov, W.M.J. Green, and F. Xia, High-throughput silicon
nanophotonic wavelength-insensitive switch for on-chip optical
networks. nature photonics, 2008. 2(4): p. 242-246.
14. J.S. Orcutt, R.J. Ram, and V. Stojanović. Integration of silicon
16. J.S. Orcutt, A. Khilo, C.W. Holzwarth, M.A. Popović, H. Li, J.
Sun, T. Bonifield, R. Hollingsworth, F.X. Kärtner, and H.I. Smith,
Nanophotonic integration in state-of-the-art CMOS foundries. Optics
express, 2011. 19(3): p. 2335-2346.
17. J.S. Orcutt, B. Moss, C. Sun, J. Leu, M. Georgas, J. Shainline, E.
Zgraggen, H. Li, J. Sun, and M. Weaver, Open foundry platform for
high-performance electronic-photonic integration. Optics express,
2012. 20(11): p. 12222-12232.
18. Y.H.D. Lee and M. Lipson, Back-end deposited silicon photonics for
monolithic integration on CMOS. IEEE Journal of Selected Topics
in Quantum Electronics, 2013. 19(2): p. 409-415.
19. I.A. Young, E. Mohammed, J.T.S. Liao, A.M. Kern, S. Palermo,
B.A. Block, M.R. Reshotko, and P.L.D. Chang, Optical I/O
technology for tera-scale computing. IEEE Journal of solid-state
circuits, 2010. 45(1): p. 235-248.
20. K. Furuya, K. Nakanishi, R. Takei, E. Omoda, M. Suzuki, M.
Okano, T. Kamei, M. Mori, and Y. Sakakibara, Nanometer-scale
thickness control of amorphous silicon using isotropic wet-etching and
low loss wire waveguide fabrication with the etched material. Applied
Physics Letters, 2012. 100(25): p. 251108.
21. G. Denoyer, C. Cole, A. Santipo, R. Russo, C. Robinson, L. Li, Y.
Zhou, B. Park, F. Boeuf, and S. Crémer, Hybrid silicon photonic
circuits and transceiver for 50 Gb/s NRZ transmission over singlemode fiber. Journal of Lightwave Technology, 2015. 33(6): p. 12471254.