**Michel Baudin**, MMTI, Palo Alto

**Vijay Mehrotra**, PhD Candidate in O.R., Stanford Univ.

**Barclay Tullis**, Hewlett Packard Corporation, Palo Alto

**Don Yeaman**, Briner/Yeaman Engineering, Santa Clara

**Randall A. Hughes**, TYECIN Systems, Inc., Mountain View

Today, most IC factories in the world still use spreadsheets --or spreadsheet logic embedded in mainframe software -- as their only tool for capacity planning and manufacturing performance analysis. These spreadsheets are used to design $300M factories, make decisions on multimillion dollar pieces of equipment, assign special personnel, and support planning decisions over horizons of a year or more.

Spreadsheet logic can establish a minimum number of machine hours needed to sustain a particular steady-state work load. On the other hand, it is incapable of determining a sufficient number. In particular, it cannot decide whether work that appears feasible in terms of gross machine hours can actually be done through coordinated actions by operators, maintenance technicians, materials handling systems and process equipment.

We illustrate the differences between spreadsheet and simulation models through the examples of diffusion cell staffing, material flows through a process segment, and the operations of a full fab. The results show that engineering judgment applied to spreadsheet outputs is not the most prudent solution when more realistic tools are available, and that by overstating capacity, spreadsheet logic can lead to a lack of key equipment and space. Using similar input data, simulation models provide capacity estimates, as well as WIP and cycle time predictions, which take into account the interactions between lots, machines, and people.

Generally speaking, the spreadsheet model is inadequate to model the coordinated use of several types of resources. Until recently, the spreadsheet was still preferred in spite of its limitations because of the vast amounts of computing resources required for simulation runs. The advent of cheap and powerful workstations has made this a moot point.

About this paper

While most IC factories in the world still rely on spreadsheets, simulators have moved from the research phase to early industrial application. The specific achievements that motivated this group of authors to share our experience, are (1) that TYECINís ManSim has found its way into the standard tool kit of fab designers at Briner/Yeaman Engineering, and (2) that it has been used in support of business decisions at HP (See [Tul90]). The literature on simulations in IC manufacturing, from [Bur86] to [Law90] describes research projects aimed at proving the usefulness of the technique. Simulators are still not on every plannerís desk the way 1-2-3 is, but they are no longer used in a pure ìtechnology pushî mode.

First, we review various approaches, from ìback-of-the-envelopeî calculations to discrete-event simulations, highlighting what can be expected of them as a function of the effort required. Then, we examine the results of using these techniques on three examples:

1. Operator job design.

2. Capacity analysis on a small process segment.

3. Capacity analysis on a full-scale fab.

Finally, we present recommendations for practitioners on the implementation of these techniques, and describe their place in the evolution of engineering tools.

We consider four approaches to capacity planning and performance modeling. Our perception of their value is summarized in Figure 1. Our ìgain vs. painî plot represents a qualitative assessment of the value of the answers obtained as a function of the amount of user effort needed to get them. Our reasons for the placement of each method on the chart are explained in the coming paragraphs.

**Figure 1. Gain vs. pain in performance analysis**

The ability to calculate orders of magnitude on the back of an envelope is a precious skill in any field. In a wafer fab case, the arguments might go as follows:

- If you release 5000 wafers every week into a process of 20 mask layers, then you will have to handle on the order of 100,000 photolithography operations every week, which, on a 24x7 mode of operation, works out to 600 per hour. If each stepper does 30 wafers/hour, then you will need at least 20 steppers.
- WIP = throughput x cycle time, and therefore, if you allow 1200 lots of WIP in the fab and 200 come out every week, then the average cycle time is going to be 6 weeks. If you want the cycle time to be 3 weeks, then you must not have more than 600 lots of WIP.

This type of calculation can prove that a capacity plan is infeasible,
but it cannot prove that it *is* feasible. In the first case, if you
put 15 steppers in, you will clearly not have enough capacity, but if you
put 30 in, there is no guarantee that the fab will be able to process 5000
wafers per week.

While extremely useful, the relationship WIP = throughput x cycle time does not tell you how to control either of its terms other than in relation to the other two. A WIP level of 1200 lots may be incompatible with a throughput of 200 lots/week, particularly with the constraint that this throughput be of a desired product mix.

What can be done with Lotus 1-2-3 or Microsoft EXCEL is a natural extension of the back-of-the-envelope calculations. Spreadsheets provide an easy way to create, store and retrieve formulas characterizing relationships between process and production parameters. Spreadsheets make it possible to quickly assess the impact of making changes, but they do not conceptually solve any of the problems that manual calculations cannot. In the context of our earlier examples, spreadsheets would make it easy to ask:

- How many steppers will be needed if we release 6000 wafers/week instead?
- If WIP is reduced to 1000 lots, with throughput held at 200 lots/week, what do we expect average cycle time to be?

A spreadsheet program emulates the paper worksheets and universal planning forms that once were the staple of office work. While not prescribing an approach, these programs suggest one by making it easy for example to multiply columns by constant factors and adding rows. 1-2-3 and EXCEL in fact have elaborate macro capabilities that can turn them into makeshift development tools in the hands of sophisticated users, but the rest of the world identifies them with the simple, linear data manipulation capabilities that have made them popular.

The planning tools that are sold as part of CIM packages such as WorkStream or PROMIS retrieve input parameters from a plant database, but everything they do is conceptually feasible with spreadsheets. In essence, they offer spreadsheet logic embedded in mainframe software. Their integration advantage is largely unappreciated: as long as they are going to use the same logic, planners generally prefer the spreadsheet programs.

Spreadsheet models are commonly used to analyze the capacity of a group of machines, a work area, or an entire fab. Typically, one uses the spreadsheet to study each machine or group of machines in isolation to determine its capacity. The individual station capacity estimates are then used to determine an estimate for fab capacity. Below, we describe a typical spreadsheet model of a fab:

**Work time.**Total number of hours that the fab is operated over a particular period.**Machine time available**.**Lot time.**Average time spent by a lot at a particular workstation:**Lot capacity.**The lot capacity of a workstation is estimated as follows:**Wafer outs**. The wafer capacity of the workstation is estimated as follows:

Machine time = Work time x Machine count x Uptime ratio

Lot time = Time per visit x Number of visits x (1+Rework)

Lot capacity = Machine time available/(Lot time x Lots per load)

A load is defined as the set of wafers that is processed in one machine at one time.

Wafer capacity = Lot capacity x Wafers/lot x Yield

In the spreadsheet, these parameter names become column headers and the formulas relating them are copied over multiple rows. It is then a simple matter to tally the results over a whole process or the whole equipment set of the fab. If average values are used for all parameters, then requirements for peak work loads will be underestimated. If peak values are used everywhere, they may be overestimated because peaks do not occur everywhere at once. Finally, this type of analysis will not give any more answers than back-of-the-envelope calculations on WIP and cycle time. To get an answer on either one, you must input the other one.

Queuing network models, a.k.a. ìflow-and-queueî models, attempt to answer some of the questions left open by spreadsheets. They aim to predict the hitherto elusive WIP levels and cycle time, given a steady-state work load, process routes and simple resource allocation rules.

They have been tried as a means of analyzing manufacturing systems, mostly by academics and with limited success in industry. We can identify two reasons for this. First, it is not established that a theory developed to model service systems ranging from airport check-in counters to computer networks is a good fit to manufacturing. Second, even if this approach can technically be made to work, it requires an excessive level of mathematical sophistication on the part of the user. Those who have the mathematical background to understand it usually do not work in production control.

Queuing theory was developed to design and analyze service systems such
as luggage check-in counters at airports, where both the arrival rate of
customers and their service times are random. The analogy between customers
at service windows and parts in front of machines is tempting but not necessarily
helpful. Mathematically tractable queuing models assume that, if you know
the present state of the system, its future does not depend on its past.
If the man in front of you at the booth has been there for 15 minutes,
you assume he will leave soon, but the queuing model doesnít. It
only knows that the man *is* there, whether he has been there one
or 15 minutes makes no difference as to the way the model anticipates his
*future* behavior.

In manufacturing, this assumption is clearly far-fetched. Where it holds, a high equipment utilization can only be achieved at the expense of high WIP and long cycle times. The level of randomness implied may be found in an R&D pilot line. A production facility where it would be present, on the other hand, would not be competitive. A well-debugged linked lithography line, where wafers are synchronized and move according to a regular beat is an example of high utilization achieved with minimum WIP and theoretical cycle time.

Even though these models are at best a fair fit to most manufacturing problems, the same is true for spreadsheets. Technically, it should be possible to use them for worst-case scenarios and to provide order-of-magnitude estimate for quantities such as WIP and cycle time. To be useful, however, those estimates would have to be understood and believed by the decision makers. So far, this barrier has not been surmounted. No one has yet found a way to explain the approach without its mathematical underpinnings, or to communicate those effectively to manufacturing personnel.

By comparison with queuing networks, the concept of discrete-event simulation is simple. TYECIN trainers estimate that, on the first try, they make themselves understood by 75% of audiences made up of production control personnel, equipment engineers and fab supervisors. The idea is to treat fab operations like a board game, where various agents make moves in turn that can be disrupted by chance in the form of equipment failures or delays.

Simulation models hold images of the machines, lots, operators, and other components of the fab and emulate the events that affect them, including lot releases, operation starts and ends, machine breakdowns and repairs, preventive maintenance, shift starts and ends, and breaks. Simulation models are a ìbrute forceî computer recreation of the events that take place in the fab over a period of time.

Discrete-event simulation can be done manually, using a blueprint of a proposed shop layout and moving paper dolls made from Post-it notes cut to scale, to visualize the dynamics of operations (See Example 1). The setup effort is small and the results quite useful, albeit of limited scale. The same logic is applied in computer simulations for long-term performance analysis or short interval scheduling.

The computing resources and time required to run simulations *used*
to be the major drawback of the technique. Tying up a mainframe computer
for seven hours was not deemed worthwhile, when many such runs would have
been needed just to validate a model. The advent of cheap and powerful
workstations has reduced the turnaround time on simulations to a few minutes,
and thereby changed the rules.

However, the efforts required to set up and validate the models, and to interpret the simulation results remain substantial, and should be undertaken with a clear and useful purpose.

Simulations are used for performance analysis in two steps, as shown in Figure 2. First, the simulator itself executes events such as lot starts and moves that change a modeled state of the factory, and generate a simulated history that must then be analyzed as a real one would be, to obtain (1) measures of throughput, cycle time and inventory for products and processes, and (2) measures of equipment utilization and queue lengths.

**Figure 2. Discrete-event simulation applied to performance
analysis**

Inputs to simulation models include information about process recipes, equipment, operators, and the fab work calendar. Events with little or no variability in their length, including many process recipe steps and measurements, are modeled as taking a fixed amount of time. Events that have significant randomness, such as equipment failures, are timed by the computer "rolls of the dice" from mean times between failures (MTBF) based on historical data.

Simulations provide many outputs. The results include estimates of the following parameters:

- Cycle times, queue times, and yield by process.
- Utilization, queue lengths, lot waiting times, and percentage of ìupî and ìdownî time by machine.
- Utilization and availability by operator.

Unlike spreadsheets, simulations do not examine the fab in terms of individual pieces of equipment. Instead, they look at the whole fab as a system, explicitly modeling the relationships and dependencies between different types of lots, equipment, and operators. In addition, simulation models are dynamic, examining how the fab runs over a specific period of time while taking into account dispatching rules, lot priorities, batching rules, and end-of-shift effects.

The design of a semiconductor production facility is usually done in four weeks. The group in charge of designing the plant has no opportunity to communicate with those who will operate it after it is built. This shows in the resulting facilities, in the form of imbalances in operator utilizations and arcane material flows. Some operators are constantly busy while others are idle most of the time. In having only a few operators, all fully utilized, there is more at stake than savings labor costs. Bored operators worry about the security of their jobs, and performing poorly designed tasks hurts their morale. Furthermore, management has at best a loose understanding of production capacities and labor time requirements.

Various types of simulations have a key role to play in designing a plant around operators, and in arranging for simple material flows, that map naturally to process sequence. For this purpose, simulations with paper dolls on blueprints are more effective than software systems. They can be carried out interactively in group sessions involving experienced production supervisors and operators. The precise capacity information gathered as a result can then be passed on to higher-level tools addressing the issues covered in Examples 2 and 3.

Figure 3 shows the layout of a section of a diffusion area. Several questions need to be asked about the roles operators have to play in it:

**Variations in production volume.**Will it be possible to run it at half speed with half as many operators as at full speed? With a poor job design, operators cannot move from an idle station to a busy one. All stations must then be staffed at all times for their local peak loads.**Variation in product mix.**How will the mix of long and short furnace operations affect operator requirements?**Equipment failures.**- What are the daily checks that can be worked into the operatorís routine without slowing down production?
- What kind of microstoppages can occur, and what effect will they have on operator requirements?
- Are there tasks operators can do during major repairs?
**Human failures.**What kind of foolproofing mechanisms can be designed into the system to prevent operator slips or misses?

**Figure 3. Diffusion area layout**

First, we list operator tasks as in Figure 4, separating for each one the time when manual intervention is required from unattended machine operation.

**Figure 4. Tasks at diffusion station**

We can then string together these tasks in a form similar to a Gantt chart, but with connections reflecting the operatorís walking time between machines, and with the separation between manual and unattended operation. We obtain the kind of task plan shown in Figure 5, which represents a repeating cycle of operator activities, with movements symbolically shown in Figure 6. The sequence can be changed, and some of the equipment can be moved. The consequences of such actions can be anticipated through this analysis.

**Figure 5. Operator task plan**

The flow we use in example 2 is shown in Figure 7. It is intended to be the simplest possible example highlighting additional problems that spreadsheet models do not typically account for. It is nonetheless a real example in the sense that we have excerpted the sequence of steps from an actual process recipe. In it are found loops, varying load sizes, and unreliable equipment. On the other hand, we assumed that operators were available in large numbers.

**Figure 7. Excerpt from an IC process flow**

Following are a few examples of the problems that the simulation illustrates:

**Equipment dependencies.**When a particular machine is unavailable, any equipment that depends on this machine as its primary source of lots may be idled.**End-of-shift slowdown.**As the end of shift approaches, operators may not begin working on lots that cannot be completed before the end of their shift, though spreadsheets typically assume that the entire shift is available for processing at each machine.**Changeovers, batching, and dispatch rules.**In addition, spreadsheets have no means for assessing the impact of lot dispatching, equipment setup, and machine loading decisions made in the fab, even though these decisions have an impact on fab capacity and lot cycle time.

We compare the fab capacity estimates from a simulation model with the predictions made by a spreadsheet model For this simple model, the spreadsheet model described above predicted that capacity of the fab, defined by the workstation with the smallest estimated capacity, to be over 5700 wafers per week.

Using TYECINís ManSim package, we ran several simulation runs of 100 days, using different random number seeds and varying start rates. The output from these runs predicted that the fab could produce no more than 4200 wafers per week.

In the following paragraphs, we explain the discrepancies between the results of the two approaches:

**Starvation**. The spreadsheet model estimates the capacity of rca_cln to be 5745 wafers/week, lowest of any of the workstations in the model. Since cleaning sinks are inexpensive, the userís reaction might be to buy more. However, our simulation recognizes that due to various flow irregularities at implant, rca_cln is not the primary bottleneck and actually has about 4% of idle time.**Batching effects and equipment setups**. As discussed earlier, spreadsheet models have difficulty dealing with machines that have a load size of more than one lot. These models typically assume either that the machine always runs with a full load or that the machine has an easy-to-estimate average load size, which is included in the spreadsheet as one of many "fudge factors."**Shift changeovers**. To account for "end-of-shift" effects, a spreadsheet model requires guesswork about the time lost due to such an effect. In this example, the spreadsheet model assumes that there is no ìend-of-shiftî effect, while the simulation model has no lots being processed that cannot be completed by the end of the current shift.

In addition, a similar input value must be supplied to somehow account for time spent on equipment setups. However, in a given fab, partial loads may occur, and lots often wait while equipment is set up. Batching and setup decisions may be based on established rules, supervisorís knowledge, or on a case-by-case basis, but are in any case difficult to distill down to the simple input parameter values required for a spreadsheet. In contrast, simulation models allow the user to specify rules about when to batch lots and perform changeovers which can come much closer to what actually happens. This example features a multi-lot processor ó F_nox ó which will wait only a certain amount of time before running a partial load, as well as two workstations ó implant and F_nox ó which perform changeovers based on lot queue times.

The type of problem we address in Example 3 is illustrated in Figure
8. We want to assess the ability of a fab with the proposed layout and
equipment set to withstand various work loads. We are putting ourselves
in the shoes of a fab designer trying to anticipate the behavior of a fab
that is *yet to be built*. In this application, ìmodel validationî
does not have the same meaning as for an operating fab. The point is not
to have simulated performance match actual performance, but to guess better
than with other methods.

**Figure 8. A full fab layout**

This example has:

- 60 pieces of equipment.
- Two processes with 196 and 188 process steps respectively.
- Six different products.

The spreadsheet model estimates equipment capacity by approximating the number of visits per lot as a weighted average of number of visits per process to a given piece of equipment. Similarly, the processing time is estimated as a weighted average of the processing times at the various visits to that equipment. The spreadsheet model is unable to take into account batching of different types of lots together, time spent on equipment set-up due to multiple products in the fab, and the effect of lot priorities on throughput.

The simulation includes multiple products, and multiple types of lots
contending for resources. Hot lots, batching and dispatch rules are modeled
explicitly. Briner/Yeaman Engineering puts together such models on a routine
basis as part of fab design, *in less than one week*. Most significantly,
for a fab with multiple types of products, simulation models produce outputs
such as cycle time and WIP estimates for each product type. Spreadsheets
cannot provide any information whatsoever about these performance measures,
which do impact the sizing of the materials handling systems.

**The results **

The spreadsheet model, created in Microsoft Excel, analyzes each of the workstations in the fab individually, as described above, and shows the stepper as the workstation with the smallest production capacity. The capacity of the stepper workstation, and thus of the the fab, is estimated to be 1275 wafers per week. The simulation model, also created in ManSim, was run 10 times for 100 days each time, with different random number seeds for each run. The simulation also identified the stepper workstation to be the primary bottleneck. However, the simulation model estimates fab capacity to be only 974 wafers per week, or 25% less than the spreadsheet predicts.

In addition to the reasons listed in Example 2, the factors below help explain this discrepancy:

**Operator shortages in critical areas**. As discussed in our first example, spreadsheet models do not take people into account very effectively, even though the impact of operators on fab performance is significant. In this example, there is only one implant operator available per shift to operate two different types of implanters. Consequently, the high voltage implanter, which has an average of 7.1 lots in queue , still spends 17.7% of its time idle, in part due to the unavailability of an operator.**Multiple products**. The estimates of capacity given by the spreadsheet model assume a certain product mix. However, the spreadsheet gives no information at all about the different product types. In contrast, the simulation model takes into account lot sizes and priorities, while estimating cycle time, WIP, and other performance measures for each product type. Below is a table from one of the simulation model runs:

**Wafers Average times
in hours**

**Significant batching**. The F_anneal workstation has a load size of four lots, and is shown to be a major bottleneck by both the spreadsheet model and the simulation. However, while the spreadsheet assumes that this equipment is processing whenever it is not down, while the simulation model recognizes that the equipment spends a significant amount of time (over 50%) waiting for enough lots to form a full batch for processing.**Customized equipment.**One of the assumptions of spreadsheet models is that machines of the same make and model are interchangeable. They have the same capacity to perform the same operations. In fact, once a machine has been in a factory for a while, it is usually customized by means of special fixtures, and is no longer interchangeable with its siblings. In contrast, simulation models represent each machine as a unique entity, and allow for it to have its own processing speed, reliability parameters, load sizes, and preventive maintenance schedule. This allows simulations to more accurately model a fab, especially one with different products or processes.

Outs/week | Cycle time | Queue time | Processing time | ||

Prod-123 | 266 | 787.6 | 672.6 | 63.5 | |

Prod-789 | 140 | 806.7 | 692.0 | 77.3 | |

Prod-abc | 188 | 948.5 | 833.6 | 15. | |

Prod-xyz | 126 | 946.2 | 844.7 | 32.6 | |

Prod-yyy | 160 | 884.7 | 762.6. | 77 | |

Prod-zzz | 101 | 779.6 | 685.8 | 91.1 |

Many fabs lack the operational information that bridges the gap between a run sheet and a shop layout. This information is generated as a by-product of the manual simulations illustrated in Example 1. In an existing fab, those are worthwhile projects for small group activities of operators. For new fabs, more time should be allocated at the design stage to accommodate this type of analysis. Unless this is done, not only will key input parameters be missing, but the minds of the users will not be ready to accept the outputs of computer simulations.

A simulation study should be undertaken to answer questions that are formulated clearly ahead of time, such as: "If we agree to add this product to our mix, how will it affect our delivery performance for all others." On the other hand, it may not be advisable to expend too much energy simulating reworks. Like fumbling a football, reworking wafers is an activity that everyone knows would best be dispensed with. Whatever resources are available are better spent avoiding it than getting a precise estimate of the damage it is causing.

**Build minimal models **

If you have a problem that can be solved on the back of an envelope, do it. Otherwise, if a spreadsheet will work, use it. As we have seen above, there are answers a spreadsheet model will not provide. To get these, use a simulation. Then build the smallest model that will provide the desired answer. If a concept can be checked out with a two-station network, don’t try it on a full fab model. In many cases, useful qualitative results can be obtained with a simplified model. If a simulation is intended to be used for short interval scheduling, then it must reflect accurately many details of the fab. Choose effective computer tools Tullis [Tul89] discusses the experience of HP’s R&D facility in this regard. His conclusion was that simulation languages like SLAM-II or SimScript required so much setup and model maintenance effort that a productive relationship with end-users could not be sustained and that a special-purpose, table-driven simulation system like ManSim was more effective.

Conclusions

Back-of-the-envelope calculations, spreadsheets, and queuing networks all entail directly calculating the parameters of models. Like almost all applied mathematics, these approaches use exclusively linear methods, not because they fit but because they are tractable. The dynamics of product movement from machine to machine through process flows are intrinsically non-linear. The behavior of a large production system cannot be deduced simply from that of its component subsystems using linear models. Attempts to do so result in entering as inputs the answers one should get as outputs and protecting oneself against variability by fudge factors. In this as in other fields of engineering, simulation bypasses the need for simplistic assumptions.

We cannot have an equation or even an algorithm to calculate what we are after, but we can build an electronic replica of the fab, make the computer play it like a board game, and get our answers from the score sheet. Generally speaking, spreadsheets are inadequate to model the coordinated use of several types of resources, as well as the effect of frequent changeovers. Until recently, the spreadsheet was still preferred in spite of its limitations because of the vast amounts of computing resources needed by simulators. The advent of cheap and powerful workstations has made this a moot point.

[Bau90] Baudin, Michel, Manufacturing Systems Analysis, Prentice Hall, 1990, Chapters 19 and 22

[Bur86] Burman, David et al., Performance analysis techniques for IC manufacturing lines, AT&T Technical Journal, Vol. 65, Issue 4, pp. 46-57, 1986

[Cer88] Cernault, Argan, La simulation des systèmes de production, Cépaduès-Editions, 1988

[Law90] Lawton, James W. et al., Workload regulating wafer release in a GaAs fab facility, ISMSS, 1990

[Mil90] Miller, David J., Simulation of a semiconductor manufacturing line, Communications of the ACM, Vol. 33, No. 10, pp. 89-108, 1990

[Tul90] Tullis, Barclay et al., Successful modeling of a semiconductor R&D facility, ISMSS, 1990