The 1 Billion Transistor Processor: Who Will Be First?
Vu Ho, Don Scansen and Ed Keyes, Semiconductor Insights Inc., Ottawa, Canada -- Semiconductor International, 3/1/2003
|
Historically, processor performance has been a key driving force behind semiconductor technology innovations — a 30% dimension reduction delivers a twofold increase in transistor density and a ~50% increase in device speed primarily because of the shorter carrier transit.1
Recently, ATI Technologies (Markham, Canada) brought to market its Radeon 9700, a graphic processor with more than 100 million transistors. Rival Nvidia (Santa Clara, Calif.) will debut its 100 million transistor device, the NV30, early this year.
Figure 1 shows the Radeon 9700 die, stripped to metal 2. It measures 14.8 × 14.8 mm and does not contain significant amounts of cache memory; hence, the logic transistor count on this device is truly 100 million. The race is now on to reach the next significant integration milestone — the 1 billion transistor (1G) processor.
The 1G processor will be a significant milestone in terms of sheer integration density, but also — and more importantly — in terms of unprecedented functionality and processing power. In this article, we will review current 100 million transistor technology and outline the necessary developments for the fabrication of 1 billion transistor processors. The scope of this article is limited to the required semiconductor technology. Advances on other fronts including design tools, test methodology and packaging will also be needed.
| 1. The top view of an ATI
Radeon device. |
Many obstacles have been overcome to realize the 100 million transistor processor. It has required innovations in materials, equipment, maskmaking and process technology, combined with advancements in design and test from R&D teams around the world.
The main challenges in reaching higher levels of integration include not only finer lithography and etch processes, but also vertical scaling of junctions and gate dielectrics to optimize transistor performance and, more recently, advanced interconnect to minimize RC delay.
The perceived limit of optical lithography due to diffraction has been continually pushed with deeper UV lithography. The resulting depth of focus problem, at the gate level in particular, has been accommodated by the adoption of both shallow trench isolation and by chemical mechanical planarization (CMP) at all levels. RC interconnect delay has been managed with the introduction of copper damascene processing and lower-k dielectrics for the intermetal dielectric (IMD).
At the gate and diffusion level, polycide and silicided junctions are employed to reduce parasitic resistance. Tungsten silicide and titanium salicide were used traditionally, but have now been replaced by cobalt salicide. Nitride sidewall spacers were introduced to further reduce the resistance of the LDD regions through enhanced gate coupling.
Perhaps the aggressive shrinking of the gate length to well below the nominal process dimension has generated the largest speed improvements. This is especially true for the high-performance transistors found in high-end processors. The 2001 International Technology Roadmap for Semiconductors (ITRS) predicts a physical gate length of just 32 nm for the 80 nm node.
| 2. A high-magnification TEM cross
section of an NMOS transistor. |
Figure 2 is a TEM cross section of a minimum-gate-length NMOS transistor. Although fabricated in a nominal 0.13 µm, 6-metal process, the physical gate length is 50 nm. Note the nitride sidewall spacers and cobalt salicide of the gate and source/drain regions. Figure 3 shows the six layers of copper damascene interconnect structure. The SIMS profile in Figure 4, taken through the IMD stack, reveals the use of low-k fluorinated oxide in the IMD levels 1 through 3. A spreading resistance profile through the device wells (Fig. 5) reveals a 1.7-µm-deep N-well and 1.0-µm-thick P-well on a 3-µm-thick, lightly doped epitaxial layer.
| 3. A TEM lattice fringe image of the
gate oxide. |
Based on Moore's Law, the 1 billion transistor processor should be in commercial production as early as 2007. Intel Fellow John Crawford recently conceptualized a 1 billion transistor processor, containing four Intel Itanium 2 cores and a shared cache memory. Fabricated at the 65 nm technology node, it would use a gate length of 30 nm and an equivalent oxide thickness of ~8 nm.3
Current CMOS transistor structures cannot be simply scaled to these dimensions without serious problems. The main challenges lie in the area of the gate dielectric, gate electrode, substrate and device structure, and device interconnects. The rest of this article will discuss key scaling issues and possible solutions in each of these areas. Gate dielectric| 5. A SEM cross section of the
interconnect in an SRAM cell. |
Among many possible high-k dielectrics, HfO2 with a dielectric constant of ~22 is the most promising for near-term implementation because of its superior thermal stability with polysilicon and high bandgap.4 Preliminary results from several research laboratories have confirmed the resistance to gate oxide leakage while maintaining current drive by using HfO2 for the gate dielectric.5,6 For an equivalent transistor drive current, the thickness of HfO2 can be 5.6× greater than SiO2 (k=3.9). For 65 nm technology, the required HfO2 thickness would be ~4 nm. Key HfO2 issues are acceptable interface state densities, stability, defect densities and hot carrier effects reliability.
An alternative to alternate high-k dielectrics is heavily nitrided oxide. NEC recently demonstrated a 100× reduction in gate leakage and a 10× increase in reliability for a 1.5 nm heavily nitrided gate oxide.7 The 2001 ITRS speculated that these might be adequate until at least 2007 for high-performance applications in which power consumption is not as important as raw speed. Thickness and uniformity control of what will be a sub-1.0 nm layer, however, will be a challenge.
The gate electrodeCurrently, dual-doped polycide is employed as the transistor electrode. The polysilicon is heavily doped; nevertheless, a ~0.5-1 nm thick depletion region forms at the surface of the gate in contact with the insulator under bias.8 Since this depletion layer is in series with the gate oxide, there is an effective increase in the gate oxide thickness, resulting in less current drive.
Employing a metal gate can eliminate the depletion region. Metal gates also address the problems of gate resistance and boron penetration of the gate dielectric in the PMOS gate.
Care must be taken, however, to match the work function of the metal to the NMOS and PMOS channels. The choice is either a single, mid-gap material for both NMOS and PMOS gates, or two different and separately optimized gate metals. A single, mid-gap material is simpler to process, but usually leads to a buried channel device with degraded characteristics. Super steep retrograde (SSR) channel doping will be required to maintain performance.9 Several refractory metals are being explored, including tungsten and molybdenum. Alternatively, a cobalt silicide gate has been demonstrated by complete silicidation of polySi.10 Recently, it was reported that both NMOS and PMOS transistors using HfO2 gate dielectric with TaSiN for NMOS and TiN for PMOS as the gate electrode material have been successfully fabricated.11 Crucial processing issues are compatibility of the gate and gate dielectric materials, and integration of a metal gate into normal CMOS process flow.
Silicon substrate and device structuresCurrently, heavily doped silicon with a more lightly doped epitaxial layer is the substrate of choice for high-performance logic devices. This structure will also change in the next few years because of continually increasing speed requirements and the problem of exponentially increasing off current (Ioff).
Saturation drive current is strongly dependent on gate overdrive (VDD-Vt), but VDD must decrease with feature size to maintain roughly constant electric field values. Hence, Vt must be reduced along with VDD to maintain drive current. Ioff, however, increases exponentially as Vt decreases.10 Compounding this problem is the gradual loss of gate control over the channel region (short channel effect), which is an inevitable result of aggressive scaling. This manifests itself as increased values of sub-threshold swing, drain-induced barrier lowering (DIBL) and increased off current leakage.
Based on data published by Intel, Ioff has increased by 104× from <1 pA/µm for a 0.5-µm-long gate to ~10 nA/µm for 0.07 µm gates in 0.13 µm technology.11 With continued scaling, it is projected that Ioff will be up to one order of magnitude higher for transistors with 30 nm gate lengths in 65 nm technology.11 If the number of transistors increases by 10× to 1 billion, power dissipation could be two orders of magnitude higher than in current 100M devices.
Off-state leakage must be reduced. The solution for the short channel effects is the silicon-on-insulator (SOI) substrate. SOI wafers have a layer of SiO2 insulator buried under the device layer.13 The buried oxide layer blocks leakage currents and restores long-channel-type behavior.14 This technology has been in use for several years on IBM's PowerPC products and, like copper interconnect, will eventually become mainstream.
Device interconnectsThe copper dual-damascene process is the mainstream interconnect technology for 0.13 µm technology. In the Radeon 9700, eight levels of metal are used.
The 1G processor will be interconnect intense — for the 65 nm node, the ITRS predicts 10 levels of metal interconnect, local wiring pitches of 150 nm and 11 km of interconnect length per square centimeter of die. Fortunately, solutions for device interconnect, at least for the 1G processor, are relatively straightforward. Copper will continue to be the metalization of choice. However, the IMD will quickly migrate from current fluorinated silicate glass (FSG) to lower-k materials. The most likely candidates are either spin-on polymers such as Dow Chemical's SiLK or organic/inorganic hybrids such as Applied Materials' Black Diamond or Novellus' CORAL.15
Finally, although it may not be necessary to use optical interconnects for the 1 billion transistor processor, optical interconnects for driving clocks in future devices is of interest as they do not have RC delay.
ConclusionBeyond the 1G limit, new device structures will be required. Based on the inherent advantages of the SOI substrate where substrate leakage can be eliminated by removing the leaky path through the silicon substrate, several transistor structures have been proposed including the single-gate fully depleted transistor (DST),16 double-gate transistor FinFET,17 planar double-gate transistor,18 and tri-gate fully depleted transistor.19
The FinFET is built by thinning the silicon layer on the buried oxide layer of the SOI wafer down to a few tens of nanometers, then etching it to form a narrow vertical fin that sticks up from the wafer surface. The channel of the device is formed in the fin, which rests on the insulator. Source and drain electrodes are built at each end of the fin and the gate drapes over both of its sides.
The planar double-gate transistor utilizes a novel wafer bonding process to form a gate electrode on both sides of the thin silicon active layer that is originally on a buried oxide layer.
The latest tri-gate device where the gate length is equal to the silicon body width and silicon body height relaxes the silicon thickness requirement as in the single-gate structure.
With advances in silicon-based optoelectronics and MEMS, breakthroughs to facilitate the integration of on-chip optical interconnect are expected in the future. It is difficult to predict who will be the first to reach the benchmark of 1 billion transistors on a logic processor. It is unlikely to be the traditional microprocessor powerhouses like IBM or Intel, which design for high-volume, general-purpose applications. The drivers for the 1G chip will be high-performance computing (such as graphics processors) or highly integrated system-on-chip applications such as a single-chip telephone. There is little doubt, however, that 1G processors will arrive. Chip designers have always run out of transistors before running out of features.
| Author Information |
| Vu Ho joined Semiconductor Insights in 2002 as a senior process analyst. He has more than 20 years of expertise through senior technical positions with Nortel and STMicroelectronics. He has a B.S., M.S. and Ph.D. in electronic engineering from Tokyo University. |
| Don Scansen is manager of Semiconductor Insights' process analysis department. He brings more than 10 years of semiconductor expertise to this role, including senior technical posts with the National Research Council of Canada and Chipworks. He has a B.S.E.E., M.S. and Ph.D. from the University of Saskatchewan. |
| Edward Keyes is vice president and chief technology officer of Semiconductor Insights. Since joining the company in 1991, he has held a series of progressively senior technical and management positions, including manager of process analysis, director of intellectual property services, and vice president of operations. He has a B.S. in applied physics from the University of Waterloo, and an M.S. in electrical engineering from Carleton University. |
| References |
|