The High Road to Process Control
Robert H. McCafferty RHM Consulting, Sandy Hook, Conn. -- Semiconductor International, 6/1/2001
| At a Glance | |||
| |||
Once programmed, and applied within a restricted domain that harbors no surprises, computers can do this equally well with tireless consistency for an indefinite period. Until programmed to a decision-making task, they're essentially worthless for all but rote recording purposes. So, if there's a way to can human expertise for recognition of events keying high-value decisions, the best of both worlds can be attained. Pattern recognition provides such a mechanism.
Background
Decisions made in real time can effectively be classed as complex (for calculation and recognition of events such as endpoint in multi-step etching processes) and conformance. Both have considerable utility at a process level, since detecting when an etch is not proceeding normally (the conformance case) and halting processing before compromising further wafers is no less important than perfectly calculating process change points and calling endpoint for each wafer (the complex case).
Less obviously, this is also true at an equipment level, where a pattern recognition system is looking for parameter trajectories that conform to expectation and specific fault signatures among those that do not. Numerical recognition methods can also be engaged for the latter, particularly in cases where specific fault signatures (which essentially finger what to fix) do not exist. In either event, the importance of such decisions increases with larger, higher-value wafers,1 and with factories that operate their capital investment at higher throughput.
For univariate methods, process-level conformance decisions are essentially a sub-case of complex, event recognition decisions. Consequently, it is useful to detail the complex case first and subsequently extend discussion to conformance scenarios. At an equipment level, however, the easiest approach is generally to detect nonconformance numerically, then hunt down its root cause(s) with pattern recognition techniques. The objective of both efforts is to detect errant or idiosyncratic equipment behavior before it manifests itself as tool failure and potentially compromised product. For the savvy fab this will support a just-in-time maintenance methodology, where PM intervals are lengthened and incipient failures dealt with (as feasible) during equipment idle periods. Hence, product quality and equipment functionality remain separate issues, while available factory capacity is maximized.
Process-level decisions
|
|
On a batch reactor, skilled operators can do this with reasonable effectiveness — nothing moves too quickly for human decision mechanisms and relatively few process runs are required to etch an acceptable number of lots per day. Single-wafer etch equipment, however, which has a host of desirable properties with respect to batch machinery, changes the game completely — a plethora of wafers are now etched per shift at a rate human decision making simply cannot accommodate. This drives the need for increased mechanization, requiring either degraded process control (strictly timed etching), use of additional external instrumentation (such as emissions spectrometers, which bring a mixed blessing), or offload of rote but complex decision making from a human operator.
Pattern recognition
To pull off pattern recognition one must effectively have a "heartbeat" signal — typically identified through off-line research or measurement development — that telltales everything of interest within a process or system under analysis. This can be a composite of several signals, but then it must be separately confirmed that all are viable when needed, while the pattern recognition problem itself is multiplied (or worse, if interactions among signals such as might constitute a dimensionless number, provide keynotes). The meaning of that heartbeat, and the patterns it should display over time, must then be formalized. For an endpoint trace this is known from basic interferometry, with the pattern taken simply dictated by how much film (and its character, as indicated by reflectance amplitude) should have been removed through a given stage in the process.
Other scenarios, germane to different industries and applications, such as arrhythmia on electrocardiograms, may or may not be so obvious. But once their signal meaning and expected behavior is known, the pattern recognition and event detection problem can be effectively addressed.
Pattern recognition can be accomplished by discriminant or syntactic methods, with the former extracting a vector of characteristic features from a pattern to conduct recognition by locating a feature vector in feature space. Because this can create an intractably large number of characteristic features, particularly with signals bearing time-varying shape and structure, syntactic recognition exploits hierarchical decomposition of complex shapes into simpler sub-patterns. This allows a virtually infinite set of shapes to be represented by only a small lexicon of primitives combined via an established grammar (set of rules). Parsing a signal applies those rules to identify a pattern as a function of given primitives.
|
|
Alternatively, the entire signal may be directly parsed into primitives, but this can be unwieldy and is clearly unnecessary in the case of a repetitive, time-varying signal. Such is particularly true when robust parameter values governing the fit of primitives to data can be statistically synthesized at the click of a mouse.
Solution logic
|
|
Specifically, we can parse the laser trace signal — as data arrives — into bands, disregarding leftovers that are irrelevant signal artifacts or as yet incomplete bands. For each pass of the parser, then, a solution algorithm can count the number of discovered peaks and troughs, capturing signal value and time for both as well as the trending-up or trending-down duration (Fig. 3). With that information, resist to TEOS etching transition will occur when:
- Four peaks and four troughs have occurred; and
- A signal peak is substantively (beyond the range of normal electronic noise) lower than its predecessor or trough higher, as TEOS's reflectance is lower than photoresist's — leading to a lower amplitude signal.
Further, process change time (from the first to second etching steps) can be calculated as 1/4 band (half the duration of the preceding trending-up or trending-down) past the sixth peak and sixth trough. Finally, endpoint can be declared after nine peaks and eight troughs, with the first signal value (allowing small margin for noise) greater than the ninth peak or less than the eighth trough. So, given robust pattern recognition intelligence and an algorithm to orchestrate its operation, the etch process control and endpoint detect problem can readily be solved.
Generic solution structure
Generically, a solution algorithm — essentially a template characterizing the entire signal that is evaluated against signal data in real time by the parser — consists of sub-templates, constructs, variables and operators. The sub-templates are combinations of basic template elements (e.g., trending-up and trending-down) trained to match against specific signal features (such as a band) while variables are local to a specific span of the solution algorithm (hence, algorithm variables) or global across the entire template (attribute variables).
Boolean (>, >=, =, < =, <, AND, OR and NOT) and numerical (+, -, *, /) operators govern interaction among variables, which themselves are established by GET/PUT (for attribute variables) and SET! (for algorithm variables) statements initially. Logical OR (setting up if/then/else structures), logical AND (for including and simultaneously analyzing parallel signals), looping (to recognize and potentially act on one or more instances of a signal feature or sub-feature) and sequential program flow constructs govern solution template logic.
This occurs in what is best described as a graphical programming language, which builds, displays and executes template algorithms germane to the solution of a pattern recognition problem, and which produces "code" that looks more like a flowchart than a series of conventional programming statements.
Endpoint algorithm
|
|
Following this comes a branching (logical OR) structure, with branch paths evaluated by the parser from the topmost down (per convention), and the lower branch a "no-op" or drop through for logical escape until the upper branch evaluates "true" — as it will, but not until all conditions are correct. The upper branch's first statement is again an entry condition test for FilmChangeFlag (which is not strictly necessary here, but left in place for symmetry and clarity with the back half of this loop), succeeded by tests for four peaks and four troughs.
If the tests fail, logical escape is again taken through the lower branch, but once sufficient peaks and troughs have been counted a check will be made for reduction in peak amplitude associated with film change from resist to TEOS. If this latter test is not successful then the "no-op" will be selected as looping continues — since more than four bands of photoresist are actually possible in this process scenario — but if amplitude reduction is detected FilmChangeTime and FilmChangeFlag will be set appropriately.
The second half of this loop mirrors the first, only now engaging a trending-down pattern recognition element (for the latter half of a band), and operating with troughs rather than peaks. Once branching sections in either half of the loop detect film change the initial entry condition will fail, thus moving the parser to next calculate ProcessChangeTime and match the laser trace signal accordingly.
ProcessChangeTime calculation proceeds in a fashion analogous to the previous loop, matching, counting and recording peaks and troughs as before then branching to detect six peaks and six troughs. Once that condition is attained, ProcessChangeTime is calculated as the last peak or trough time plus one half its preceding upstroke or downstroke duration, thus yielding a juncture to trigger process change corresponding to an etch depth 1¾ bands into the TEOS.
Since nothing else in the laser trace signal is of particular interest until after the process change, a "sized-gap" (don't care) segment is matched by the parser until that time is reached — at which point ProcessChangeFlag is raised, typically triggering an "end of etch" digital output or similar mechanism. This is followed by another "sized-gap" until gases restabilize, plasma is ignited, and normal etching resumes, to accommodate laser signal artifacts associated with chamber shut-down, process change and reaction restart.
Lastly, to detect and declare endpoint, looping to capture peaks and troughs continues as before. However, after eight peaks and eight troughs the loop is terminated and one further peak counted — placing the etch within one half band of exhausting all TEOS and penetrating nitride. Because no further information in the laser trace is of interest until nitride penetration is observed, another "sized-gap" is matched by the parser, immediately followed by a branch. This branch, however, has no escape path, and hence forces capture of endpoint time and setting of EndpointFlag when the signal value exceeds 102.5% of the last peak or drops below 97.5% of the last trough. As before, raising EndpointFlag pulses (presumably) a digital output or similarly triggers external "end of etch" circuitry for plasma shut-down followed by succeeding (vent, purge and unload) process steps.
Conformance decisions
When a heartbeat signal and complex decision algorithm exist, univariate conformance decisions (i.e., fault detection) reduce to the null evaluation case for each stage of the decision algorithm. Hence, if film change were not detected before five peaks and five troughs, a process-level fault of "Film Change Not Detected" would be issued to the equipment controller (if configured to deal with such exceptions) or its station controller. In either event, an operator assist would be required and control reverted to either manual or strictly timed etching for the remainder of the failing wafer as well as (presuming continued processing was allowed at all) its parent lot. This would continue until the fault's origin was discovered and failure resolved to allow resumption of normal processing.
Similar non-conformance events would be declared given inability to calculate process change falling within a given time of process start or detect endpoint itself — all of which would be implemented by rather modest extension to the controlling decision algorithm. Given no complex decision algorithm, however, but one or more signals whose behavior was reflective of overall process integrity, a template would be devised for each such signal. Conformance would then be positive — hence, the process fault-free — if all signals successfully parsed with their governing characteristic template.
Given the fundamental objective of finding problems before they happen, one is best advised to engage the most fluidly applied and least data hungry means to accomplish that end. Numerical techniques of a relatively simplistic nature operated on a broad spectrum of single variables lend themselves well to this, bringing forth indications of equipment flaws and control degradation at least as soon as tuned recognition templates (quite possibly sooner) and almost undoubtedly faster than envelope fitting techniques.
Where an underdamped system and consequent ringing control loop might fit well within an envelope of normality and be insufficiently pronounced for immediate detection by a recognition template, standard deviation of the controlled parameter would telltale problems quickly. In general, slope, standard deviation, mean, range and min/max magnitude work well as numerical detection arguments with single variables, particularly when numerically detected issues will be further analyzed by pattern recognition.
|
|
As an example, consider the pressure control scenario depicted (by real data) in Figure 5 and Figure 6. Figure 5 displays pumpdown from load lock to process pressure, while Figure 6 illustrates pressure behavior over a bulk etching step. Note both the spike during pumpdown and clearly marginal pressure control quality throughout bulk etch. Defining numerical detection measures of slope (which often highlights rate of rise/descent issues) to be active during the pumpdown period and standard deviation (a classic for picking up measurement and control degradation problems) as active over bulk etch readily fingers problem wafers from a split-lot of 12 (Table).
Table. Problem Wafers During Pumpdown and Over Bulk Etch
During pumpdown
328.0
ABORT
semicon.txt
1
12345
AB67
initial-slope-pressure_cham
>40.0
Value:
53
2350.0
WARN
semicon.txt
8
12345
AB67
initial-slope-pressure_cham
<10.0
Value: 5
3567.0
WARN
semicon.txt
12
12345
AB67
initial-slope-pressure_cham
<10.0
Value:
7.5
Over bulk
etch
838.0
WARN
semicon.txt
3
12345
AB67
std-dev-pressure_cham
>4.0
Value:
4.9805536
1141.0
WARN
semicon.txt
4
12345
AB67
std-dev-pressure_cham
>4.0
Value:
4.7696843
3572.0
WARN
semicon.txt
12
12345
AB67
std-dev-pressure_cham
>4.0
Value:
4.407117
Nothing was broken in either scenario — pumping speed overcame the chamber pressure glitch (typically an artifact of damaged or poorly seating chamber seals), and mean pressure remained on target (although this control loop needs tuning attention). But it's obviously time to find a hole in WIP and resolve both issues before they put a tool down and a lot in jeopardy.
|
|
Once equipment-level issues are numerically detected, template-driven recognition techniques become an invaluable tool in their elimination. Consider once again the pressure glitch of Figure 5, which at this point is known only as a pumpdown sequence showing excessive initial slope. Applying a "Pressure-Glitch" template, which is nothing more intricate than a "Band" template whose internal parameters have been mechanistically trained to recognize sharp spikes but reject other signal behavior, yields the discovery of Figure 7.
Pragmatically speaking, the "Pressure-Glitch" template would probably be one of a library of pre-fabricated templates run against signal data whenever a problem — and hence incipient failure — was numerically detected. Recognition would then classify the failure mode and directly indicate what to repair.
Conclusions
|
|
In either event, fault conditions can be detected, identified and reacted to at a process (for control purposes) or equipment (for fault detection and incipient problem repair) level. Hence, where quality experts advocate that one "listen to their parts,"4 manufacturers engaging pattern recognition can listen intelligently to the process that created those parts as well as the equipment that orchestrated the process. Although not a "something for nothing" proposition (pattern recognition techniques require thinking to set up ... one must have intelligence to encapsulate), manufacturers making the investment look for quality issues and opportunities in their processes and equipment rather than exclusively on their parts. This, in turn, creates "virtual sensors" whose return is not academic.
Robert H. McCafferty independently operates RHM Consulting as the North American agent for Curvaceous Software Ltd. He began his work in semiconductors at IBM Microelectronics (Burlington, Vt.). From there he consulted for a subsidiary of Bolt, Beranek and Newman (BBN), which became part of Brooks Automation, where he specialized in semiconductor and pattern recognition. He has a B.S. and M.S. in mechanical engineering, and an M.S. in computer science from the University of Virginia.Phone: 1-203-270-1626
e-mail: bobmccaf@earthlink.net
REFERENCES
- J. Baliga, "Advanced Process Control: Soon to be A Must," Semiconductor International, July 1999.
- R.H. McCafferty, "Etch Endpoint Detection Via Pattern Recognition," Proceedings of the SEMATECH AEC/APC Symposium XI, Vail, Colorado, September 1999, p. 871.
- Brooks Automation Inc., Patterns 3.5 Software. Chelmsford, Mass., 1999.
- K.R. Bhote, World Class Quality. New York: AMACOM division of American Management Association, 1991.