Michel Baudin, MMTI, Palo Alto
Vijay Mehrotra, PhD Candidate in O.R., Stanford Univ.
Barclay Tullis, Hewlett Packard Corporation, Palo Alto
Don Yeaman, Briner/Yeaman Engineering, Santa Clara
Randall A. Hughes, TYECIN Systems, Inc., Mountain View
Today, most IC factories in the world still use spreadsheets --or spreadsheet logic embedded in mainframe software -- as their only tool for capacity planning and manufacturing performance analysis. These spreadsheets are used to design $300M factories, make decisions on multimillion dollar pieces of equipment, assign special personnel, and support planning decisions over horizons of a year or more.
Spreadsheet logic can establish a minimum number of machine hours needed to sustain a particular steady-state work load. On the other hand, it is incapable of determining a sufficient number. In particular, it cannot decide whether work that appears feasible in terms of gross machine hours can actually be done through coordinated actions by operators, maintenance technicians, materials handling systems and process equipment.
We illustrate the differences between spreadsheet and simulation models through the examples of diffusion cell staffing, material flows through a process segment, and the operations of a full fab. The results show that engineering judgment applied to spreadsheet outputs is not the most prudent solution when more realistic tools are available, and that by overstating capacity, spreadsheet logic can lead to a lack of key equipment and space. Using similar input data, simulation models provide capacity estimates, as well as WIP and cycle time predictions, which take into account the interactions between lots, machines, and people.
Generally speaking, the spreadsheet model is inadequate to model the coordinated use of several types of resources. Until recently, the spreadsheet was still preferred in spite of its limitations because of the vast amounts of computing resources required for simulation runs. The advent of cheap and powerful workstations has made this a moot point.
While most IC factories in the world still rely on spreadsheets, simulators have moved from the research phase to early industrial application. The specific achievements that motivated this group of authors to share our experience, are (1) that TYECINís ManSim has found its way into the standard tool kit of fab designers at Briner/Yeaman Engineering, and (2) that it has been used in support of business decisions at HP (See [Tul90]). The literature on simulations in IC manufacturing, from [Bur86] to [Law90] describes research projects aimed at proving the usefulness of the technique. Simulators are still not on every plannerís desk the way 1-2-3 is, but they are no longer used in a pure ìtechnology pushî mode.
First, we review various approaches, from ìback-of-the-envelopeî calculations to discrete-event simulations, highlighting what can be expected of them as a function of the effort required. Then, we examine the results of using these techniques on three examples:
1. Operator job design.
2. Capacity analysis on a small process segment.
3. Capacity analysis on a full-scale fab.
Finally, we present recommendations for practitioners on the implementation of these techniques, and describe their place in the evolution of engineering tools.
We consider four approaches to capacity planning and performance modeling. Our perception of their value is summarized in Figure 1. Our ìgain vs. painî plot represents a qualitative assessment of the value of the answers obtained as a function of the amount of user effort needed to get them. Our reasons for the placement of each method on the chart are explained in the coming paragraphs.
Figure 1. Gain vs. pain in performance analysis
The ability to calculate orders of magnitude on the back of an envelope is a precious skill in any field. In a wafer fab case, the arguments might go as follows:
This type of calculation can prove that a capacity plan is infeasible, but it cannot prove that it is feasible. In the first case, if you put 15 steppers in, you will clearly not have enough capacity, but if you put 30 in, there is no guarantee that the fab will be able to process 5000 wafers per week.
While extremely useful, the relationship WIP = throughput x cycle time does not tell you how to control either of its terms other than in relation to the other two. A WIP level of 1200 lots may be incompatible with a throughput of 200 lots/week, particularly with the constraint that this throughput be of a desired product mix.
What can be done with Lotus 1-2-3 or Microsoft EXCEL is a natural extension of the back-of-the-envelope calculations. Spreadsheets provide an easy way to create, store and retrieve formulas characterizing relationships between process and production parameters. Spreadsheets make it possible to quickly assess the impact of making changes, but they do not conceptually solve any of the problems that manual calculations cannot. In the context of our earlier examples, spreadsheets would make it easy to ask:
A spreadsheet program emulates the paper worksheets and universal planning forms that once were the staple of office work. While not prescribing an approach, these programs suggest one by making it easy for example to multiply columns by constant factors and adding rows. 1-2-3 and EXCEL in fact have elaborate macro capabilities that can turn them into makeshift development tools in the hands of sophisticated users, but the rest of the world identifies them with the simple, linear data manipulation capabilities that have made them popular.
The planning tools that are sold as part of CIM packages such as WorkStream or PROMIS retrieve input parameters from a plant database, but everything they do is conceptually feasible with spreadsheets. In essence, they offer spreadsheet logic embedded in mainframe software. Their integration advantage is largely unappreciated: as long as they are going to use the same logic, planners generally prefer the spreadsheet programs.
Spreadsheet models are commonly used to analyze the capacity of a group of machines, a work area, or an entire fab. Typically, one uses the spreadsheet to study each machine or group of machines in isolation to determine its capacity. The individual station capacity estimates are then used to determine an estimate for fab capacity. Below, we describe a typical spreadsheet model of a fab:
Machine time = Work time x Machine count x Uptime ratio
Lot time = Time per visit x Number of visits x (1+Rework)
Lot capacity = Machine time available/(Lot time x Lots per load)
A load is defined as the set of wafers that is processed in one machine at one time.
Wafer capacity = Lot capacity x Wafers/lot x Yield
In the spreadsheet, these parameter names become column headers and the formulas relating them are copied over multiple rows. It is then a simple matter to tally the results over a whole process or the whole equipment set of the fab. If average values are used for all parameters, then requirements for peak work loads will be underestimated. If peak values are used everywhere, they may be overestimated because peaks do not occur everywhere at once. Finally, this type of analysis will not give any more answers than back-of-the-envelope calculations on WIP and cycle time. To get an answer on either one, you must input the other one.
Queuing network models, a.k.a. ìflow-and-queueî models, attempt to answer some of the questions left open by spreadsheets. They aim to predict the hitherto elusive WIP levels and cycle time, given a steady-state work load, process routes and simple resource allocation rules.
They have been tried as a means of analyzing manufacturing systems, mostly by academics and with limited success in industry. We can identify two reasons for this. First, it is not established that a theory developed to model service systems ranging from airport check-in counters to computer networks is a good fit to manufacturing. Second, even if this approach can technically be made to work, it requires an excessive level of mathematical sophistication on the part of the user. Those who have the mathematical background to understand it usually do not work in production control.
Queuing theory was developed to design and analyze service systems such as luggage check-in counters at airports, where both the arrival rate of customers and their service times are random. The analogy between customers at service windows and parts in front of machines is tempting but not necessarily helpful. Mathematically tractable queuing models assume that, if you know the present state of the system, its future does not depend on its past. If the man in front of you at the booth has been there for 15 minutes, you assume he will leave soon, but the queuing model doesnít. It only knows that the man is there, whether he has been there one or 15 minutes makes no difference as to the way the model anticipates his future behavior.
In manufacturing, this assumption is clearly far-fetched. Where it holds, a high equipment utilization can only be achieved at the expense of high WIP and long cycle times. The level of randomness implied may be found in an R&D pilot line. A production facility where it would be present, on the other hand, would not be competitive. A well-debugged linked lithography line, where wafers are synchronized and move according to a regular beat is an example of high utilization achieved with minimum WIP and theoretical cycle time.
Even though these models are at best a fair fit to most manufacturing problems, the same is true for spreadsheets. Technically, it should be possible to use them for worst-case scenarios and to provide order-of-magnitude estimate for quantities such as WIP and cycle time. To be useful, however, those estimates would have to be understood and believed by the decision makers. So far, this barrier has not been surmounted. No one has yet found a way to explain the approach without its mathematical underpinnings, or to communicate those effectively to manufacturing personnel.
By comparison with queuing networks, the concept of discrete-event simulation is simple. TYECIN trainers estimate that, on the first try, they make themselves understood by 75% of audiences made up of production control personnel, equipment engineers and fab supervisors. The idea is to treat fab operations like a board game, where various agents make moves in turn that can be disrupted by chance in the form of equipment failures or delays.
Simulation models hold images of the machines, lots, operators, and other components of the fab and emulate the events that affect them, including lot releases, operation starts and ends, machine breakdowns and repairs, preventive maintenance, shift starts and ends, and breaks. Simulation models are a ìbrute forceî computer recreation of the events that take place in the fab over a period of time.
Discrete-event simulation can be done manually, using a blueprint of a proposed shop layout and moving paper dolls made from Post-it notes cut to scale, to visualize the dynamics of operations (See Example 1). The setup effort is small and the results quite useful, albeit of limited scale. The same logic is applied in computer simulations for long-term performance analysis or short interval scheduling.
The computing resources and time required to run simulations used to be the major drawback of the technique. Tying up a mainframe computer for seven hours was not deemed worthwhile, when many such runs would have been needed just to validate a model. The advent of cheap and powerful workstations has reduced the turnaround time on simulations to a few minutes, and thereby changed the rules.
However, the efforts required to set up and validate the models, and to interpret the simulation results remain substantial, and should be undertaken with a clear and useful purpose.
Simulations are used for performance analysis in two steps, as shown in Figure 2. First, the simulator itself executes events such as lot starts and moves that change a modeled state of the factory, and generate a simulated history that must then be analyzed as a real one would be, to obtain (1) measures of throughput, cycle time and inventory for products and processes, and (2) measures of equipment utilization and queue lengths.
Figure 2. Discrete-event simulation applied to performance analysis
Inputs to simulation models include information about process recipes, equipment, operators, and the fab work calendar. Events with little or no variability in their length, including many process recipe steps and measurements, are modeled as taking a fixed amount of time. Events that have significant randomness, such as equipment failures, are timed by the computer "rolls of the dice" from mean times between failures (MTBF) based on historical data.
Simulations provide many outputs. The results include estimates of the following parameters:
Unlike spreadsheets, simulations do not examine the fab in terms of individual pieces of equipment. Instead, they look at the whole fab as a system, explicitly modeling the relationships and dependencies between different types of lots, equipment, and operators. In addition, simulation models are dynamic, examining how the fab runs over a specific period of time while taking into account dispatching rules, lot priorities, batching rules, and end-of-shift effects.
The design of a semiconductor production facility is usually done in four weeks. The group in charge of designing the plant has no opportunity to communicate with those who will operate it after it is built. This shows in the resulting facilities, in the form of imbalances in operator utilizations and arcane material flows. Some operators are constantly busy while others are idle most of the time. In having only a few operators, all fully utilized, there is more at stake than savings labor costs. Bored operators worry about the security of their jobs, and performing poorly designed tasks hurts their morale. Furthermore, management has at best a loose understanding of production capacities and labor time requirements.
Various types of simulations have a key role to play in designing a plant around operators, and in arranging for simple material flows, that map naturally to process sequence. For this purpose, simulations with paper dolls on blueprints are more effective than software systems. They can be carried out interactively in group sessions involving experienced production supervisors and operators. The precise capacity information gathered as a result can then be passed on to higher-level tools addressing the issues covered in Examples 2 and 3.
Figure 3 shows the layout of a section of a diffusion area. Several questions need to be asked about the roles operators have to play in it:
Figure 3. Diffusion area layout
First, we list operator tasks as in Figure 4, separating for each one the time when manual intervention is required from unattended machine operation.
Figure 4. Tasks at diffusion station
We can then string together these tasks in a form similar to a Gantt chart, but with connections reflecting the operatorís walking time between machines, and with the separation between manual and unattended operation. We obtain the kind of task plan shown in Figure 5, which represents a repeating cycle of operator activities, with movements symbolically shown in Figure 6. The sequence can be changed, and some of the equipment can be moved. The consequences of such actions can be anticipated through this analysis.
Figure 5. Operator task plan
The flow we use in example 2 is shown in Figure 7. It is intended to be the simplest possible example highlighting additional problems that spreadsheet models do not typically account for. It is nonetheless a real example in the sense that we have excerpted the sequence of steps from an actual process recipe. In it are found loops, varying load sizes, and unreliable equipment. On the other hand, we assumed that operators were available in large numbers.
Figure 7. Excerpt from an IC process flow
Following are a few examples of the problems that the simulation illustrates:
We compare the fab capacity estimates from a simulation model with the predictions made by a spreadsheet model For this simple model, the spreadsheet model described above predicted that capacity of the fab, defined by the workstation with the smallest estimated capacity, to be over 5700 wafers per week.
Using TYECINís ManSim package, we ran several simulation runs of 100 days, using different random number seeds and varying start rates. The output from these runs predicted that the fab could produce no more than 4200 wafers per week.
In the following paragraphs, we explain the discrepancies between the results of the two approaches:
In addition, a similar input value must be supplied to somehow account for time spent on equipment setups. However, in a given fab, partial loads may occur, and lots often wait while equipment is set up. Batching and setup decisions may be based on established rules, supervisorís knowledge, or on a case-by-case basis, but are in any case difficult to distill down to the simple input parameter values required for a spreadsheet. In contrast, simulation models allow the user to specify rules about when to batch lots and perform changeovers which can come much closer to what actually happens. This example features a multi-lot processor ó F_nox ó which will wait only a certain amount of time before running a partial load, as well as two workstations ó implant and F_nox ó which perform changeovers based on lot queue times.
The type of problem we address in Example 3 is illustrated in Figure 8. We want to assess the ability of a fab with the proposed layout and equipment set to withstand various work loads. We are putting ourselves in the shoes of a fab designer trying to anticipate the behavior of a fab that is yet to be built. In this application, ìmodel validationî does not have the same meaning as for an operating fab. The point is not to have simulated performance match actual performance, but to guess better than with other methods.
Figure 8. A full fab layout
This example has:
The spreadsheet model estimates equipment capacity by approximating the number of visits per lot as a weighted average of number of visits per process to a given piece of equipment. Similarly, the processing time is estimated as a weighted average of the processing times at the various visits to that equipment. The spreadsheet model is unable to take into account batching of different types of lots together, time spent on equipment set-up due to multiple products in the fab, and the effect of lot priorities on throughput.
The simulation includes multiple products, and multiple types of lots contending for resources. Hot lots, batching and dispatch rules are modeled explicitly. Briner/Yeaman Engineering puts together such models on a routine basis as part of fab design, in less than one week. Most significantly, for a fab with multiple types of products, simulation models produce outputs such as cycle time and WIP estimates for each product type. Spreadsheets cannot provide any information whatsoever about these performance measures, which do impact the sizing of the materials handling systems.
The spreadsheet model, created in Microsoft Excel, analyzes each of the workstations in the fab individually, as described above, and shows the stepper as the workstation with the smallest production capacity. The capacity of the stepper workstation, and thus of the the fab, is estimated to be 1275 wafers per week. The simulation model, also created in ManSim, was run 10 times for 100 days each time, with different random number seeds for each run. The simulation also identified the stepper workstation to be the primary bottleneck. However, the simulation model estimates fab capacity to be only 974 wafers per week, or 25% less than the spreadsheet predicts.
In addition to the reasons listed in Example 2, the factors below help explain this discrepancy:
Wafers Average times in hours
|Outs/week||Cycle time||Queue time||Processing time|
Many fabs lack the operational information that bridges the gap between a run sheet and a shop layout. This information is generated as a by-product of the manual simulations illustrated in Example 1. In an existing fab, those are worthwhile projects for small group activities of operators. For new fabs, more time should be allocated at the design stage to accommodate this type of analysis. Unless this is done, not only will key input parameters be missing, but the minds of the users will not be ready to accept the outputs of computer simulations.
A simulation study should be undertaken to answer questions that are formulated clearly ahead of time, such as: "If we agree to add this product to our mix, how will it affect our delivery performance for all others." On the other hand, it may not be advisable to expend too much energy simulating reworks. Like fumbling a football, reworking wafers is an activity that everyone knows would best be dispensed with. Whatever resources are available are better spent avoiding it than getting a precise estimate of the damage it is causing.
Build minimal models
If you have a problem that can be solved on the back of an envelope, do it. Otherwise, if a spreadsheet will work, use it. As we have seen above, there are answers a spreadsheet model will not provide. To get these, use a simulation. Then build the smallest model that will provide the desired answer. If a concept can be checked out with a two-station network, donít try it on a full fab model. In many cases, useful qualitative results can be obtained with a simplified model. If a simulation is intended to be used for short interval scheduling, then it must reflect accurately many details of the fab. Choose effective computer tools Tullis [Tul89] discusses the experience of HPís R&D facility in this regard. His conclusion was that simulation languages like SLAM-II or SimScript required so much setup and model maintenance effort that a productive relationship with end-users could not be sustained and that a special-purpose, table-driven simulation system like ManSim was more effective.
Back-of-the-envelope calculations, spreadsheets, and queuing networks all entail directly calculating the parameters of models. Like almost all applied mathematics, these approaches use exclusively linear methods, not because they fit but because they are tractable. The dynamics of product movement from machine to machine through process flows are intrinsically non-linear. The behavior of a large production system cannot be deduced simply from that of its component subsystems using linear models. Attempts to do so result in entering as inputs the answers one should get as outputs and protecting oneself against variability by fudge factors. In this as in other fields of engineering, simulation bypasses the need for simplistic assumptions.
We cannot have an equation or even an algorithm to calculate what we are after, but we can build an electronic replica of the fab, make the computer play it like a board game, and get our answers from the score sheet. Generally speaking, spreadsheets are inadequate to model the coordinated use of several types of resources, as well as the effect of frequent changeovers. Until recently, the spreadsheet was still preferred in spite of its limitations because of the vast amounts of computing resources needed by simulators. The advent of cheap and powerful workstations has made this a moot point.
[Bau90] Baudin, Michel, Manufacturing Systems Analysis, Prentice Hall, 1990, Chapters 19 and 22
[Bur86] Burman, David et al., Performance analysis techniques for IC manufacturing lines, AT&T Technical Journal, Vol. 65, Issue 4, pp. 46-57, 1986
[Cer88] Cernault, Argan, La simulation des systèmes de production, Cépaduès-Editions, 1988
[Law90] Lawton, James W. et al., Workload regulating wafer release in a GaAs fab facility, ISMSS, 1990
[Mil90] Miller, David J., Simulation of a semiconductor manufacturing line, Communications of the ACM, Vol. 33, No. 10, pp. 89-108, 1990
[Tul90] Tullis, Barclay et al., Successful modeling of a semiconductor R&D facility, ISMSS, 1990