Kudos to engineers at IBM for staring down the eDRAM doubters over the last decade. By committing to using embedded DRAM for its Power 6 processor at the 45-nm node next year, IBM has played its eDRAM card at just the right moment.
It has become fairly clear that IBM is unlikely to have a server MPU with high-k and metal gates in the same time frame as Intel. By committing to eDRAM for the large L3 caches in its 45-nm processors, IBM has come up with its own form of advantage at the 45-nm node: bigger L3 cache size.
IBM detailed its SOI-based eDRAM technology at the International Solid State Circuits Conference on Wednesday (Feb. 14), and in a technology paper late last year at the International Electron Device Meeting, both in San Francisco. By shortening the bit line and using fast SOI technology, IBM engineers reduced the latency to 1.5 ns and the cycle time to 2 ns.
At 65-nm, the eDRAM cell size is 0.068 square microns, which compares with about 0.40 sq. microns for SRAM cells used in high-performance servers. That 6X density improvement drops to a 3X at-use difference when the eDRAM overhead -- refresh circuitry, charge pumps, and other peripheral circuits -- are included.
Subramanian S. Iyer, director of 45-nm technology development at the IBM systems and technology group, said that as the L3 macros increase in size the overhead circuits will be shared more widely and eDRAM could become 4X more dense than SRAM.
That density delta will allow IBM to throw more L3 memory bits into its Power processors. By heeding the rule of thumb that a 4X increase in cache size results in a 2X improvement in the cache miss rate, IBM could gain a significant system-level performance advantage for their Unix-based servers next year.
What about costs? Embedded DRAM often has been advertised to be only 20-30 percent more expensive than SRAM, but at the end of the day the critical trench-formation masks drove the cost delta too high for most consumer applications.
Iyer said IBM’s SOI eDRAM requires three additional masks, including the critical deep-trench mask and two block implant masks which are needed to tailor the well and threshold of the pass gate. “In SOI, the trench processing is greatly simplified. We use the buried oxide as an isolation layer between the plate and the device. This eliminates about half the processing needed for the deep trench,” he said.
All the DRAM specific levels involve dry lithography, keeping scanner costs down. The result is less than a 10 percent cost adder for an SOI-based chip with 10 levels of metal.
With nearly-SRAM-like performance, three or four times higher density than static ram macros, and a cost adder of 10 percent or less, IBM’s memory engineers have presented IBM’s system designers a potent weapon in the battle to reduce cache miss rates.