# Doing Monte Carlo Simulation in Minitab Statistical Software

Doing Monte Carlo simulations in Minitab Statistical Software is very easy. This article illustrates how to use Minitab for Monte Carlo simulations using both a known engineering formula and a DOE equation.

by Paul Sheehy and Eston Martz

Monte Carlo simulation uses repeated random sampling to simulate data for a given mathematical model and evaluate the outcome.  This method was initially applied back in the 1940s, when scientists working on the atomic bomb used it to calculate the probabilities of one fissioning uranium atom causing a fission reaction in another. With uranium in short supply, there was little room for experimental trial and error. The scientists discovered that as long as they created enough simulated data, they could compute reliable probabilities—and reduce the amount of uranium needed for testing.

Today, simulated data is routinely used in situations where resources are limited or gathering real data would be too expensive or impractical. By using Minitab’s ability to easily create random data, you can use Monte Carlo simulation to:

• Simulate the range of possible outcomes to aid in decision-making
• Forecast financial results or estimate project timelines
• Understand the variability in a process or system
• Find problems within a process or system
• Manage risk by understanding cost/benefit relationships

## Steps in the Monte Carlo Approach

Depending on the number of factors involved, simulations can be very complex. But at a basic level, all Monte Carlo simulations have four simple steps:

### 1. Identify the Transfer Equation

To do a Monte Carlo simulation, you need a quantitative model of the business activity, plan, or process you wish to explore. The mathematical expression of your process is called the “transfer equation.” This may be a known engineering or business formula, or it may be based on a model created from a designed experiment (DOE) or regression analysis.

### 2. Define the Input Parameters

For each factor in your transfer equation, determine how its data are distributed. Some inputs may follow the normal distribution, while others follow a triangular or uniform distribution. You then need to determine distribution parameters for each input.  For instance, you would need to specify the mean and standard deviation for inputs that follow a normal distribution.

### 3. Create Random Data

To do valid simulation, you must create a very large, random data set for each input—something on the order of 100,000 instances. These random data points simulate the values that would be seen over a long period for each input. Minitab can easily create random data that follow almost any distribution you are likely to encounter.

### 4. Simulate and Analyze Process Output

With the simulated data in place, you can use your transfer equation to calculate simulated outcomes. Running a large enough quantity of simulated input data through your model will give you a reliable indication of what the process will output over time, given the anticipated variation in the inputs.

Those are the steps any Monte Carlo simulation needs to follow.  Here’s how to apply them in Minitab.

## Monte Carlo Using a Known Engineering Formula

A manufacturing company needs to evaluate the design of a proposed product: a small piston pump that must pump 12 ml of fluid per minute. You want to estimate the probable performance over thousands of pumps, given natural variation in piston diameter (D), stroke length (L), and strokes per minute (RPM).  Ideally, the pump flow across thousands of pumps will have a standard deviation no greater than 0.2 ml.

### Step 1: Identify the Transfer Equation

The first step in doing a Monte Carlo simulation is to determine the transfer equation. In this case, you can simply use an established engineering formula that measures pump flow:

Flow (in ml) =  π(D/2)2 ∗ L ∗ RPM

### Step 2: Define the Input Parameters

Now you must define the distribution and parameters of each input used in the transfer equation. The pump’s piston diameter and stroke length are known, but you must calculate the strokes-per-minute (RPM) needed to attain the desired 12 ml/minute flow rate. Volume pumped per stroke is given by this equation:

π(D/2)2 * L

Given D = 0.8 and L = 2.5, each stroke displaces 1.256 ml.  So to achieve a flow of 12 ml/minute the RPM is 9.549.

Based on the performance of other pumps your facility has manufactured, you can say that piston diameter is normally distributed with a mean of 0.8 cm and a standard deviation of 0.003 cm. Stroke length is normally distributed with a mean of 2.5 cm and a standard deviation of 0.15 cm. Finally, strokes per minute is normally distributed with a mean of 9.549 RPM and a standard deviation of 0.17 RPM.

### Step 3: Create Random Data

Now you’re ready to set up the simulation in Minitab.  With Minitab you can instantaneously create 100,000 rows of simulated data.  Starting with the simulated piston diameter data, choose Calc > Random Data > Normal.  In the dialog box, enter 100,000 in Number of rows of data to generate, and enter “D” as the column in which to store the data.  Enter the mean and standard deviation for piston diameter in the appropriate fields.  Press OK to populate the worksheet with 100,000 data points randomly sampled from the specified normal distribution. Then simply repeat this process for Stroke Length (L) and Strokes per Minute (RPM).

### Step 4: Simulate and Analyze Process Output

Now create a fourth column in the worksheet, Flow, to hold the results of your process output calculations. With the randomly generated input data in place, you can set up Minitab’s calculator to calculate the output and store it in the Flow column.  Go to Calc > Calculator, and set up the flow equation like this: Minitab will quickly calculate the output for each row of simulated data.

Now you’re ready to look at the results.  Select Stat > Basic Statistics > Graphical Summary and select the Flow column.  Minitab will generate a graphical summary that includes four graphs: a histogram of data with an overlaid normal curve, boxplot, and confidence intervals for the mean and the median. The graphical summary also displays Anderson-Darling Normality Test results, descriptive statistics, and confidence intervals for the mean, median, and standard deviation. The graphical summary of your Monte Carlo simulation output will look like this: For the random data generated to write this article, the mean flow rate is 12.004 based on 100,000 samples. On average, we are on target, but the smallest value was 8.882 and the largest was 15.594. That’s quite a range.  The transmitted variation (of all components) results in a standard deviation of 0.757 ml, far exceeding the 0.2 ml target.  Also, we see that the 0.2 ml target falls outside of the confidence interval for the standard deviation.

It looks like this pump design exhibits too much variation and needs to be further refined before it goes into production; Monte Carlo simulation with Minitab let us find that out without incurring the expense of manufacturing and testing thousands of prototypes.

Lest you wonder whether these simulated results hold up, try it yourself! Creating different sets of  simulated random data will result in minor variations, but the end result—an unacceptable amount of variation in the flow rate—will be consistent every time. That’s the power of the Monte Carlo method.

## Monte Carlo Using a DOE Response Equation

What if you don’t know what equation to use, or you are trying to simulate the outcome of a unique process?

An electronics manufacturer has assigned you to improve its electrocleaning operation, which prepares metal parts for electroplating. Electroplating lets manufacturers coat raw materials with a layer of a different metal to achieve desired characteristics. Plating will not adhere to a dirty surface, so the company has a continuous-flow electrocleaning system that connects to an automatic electroplating machine. A conveyer dips each part into a bath which sends voltage through the part, cleaning it. Inadequate cleaning results in a high Root Mean Square Average Roughness value, or RMS, and poor surface finish. Properly cleaned parts have a smooth surface and a low RMS.

To optimize the process, you can adjust two critical inputs: voltage (Vdc) and current density (ASF). For your electrocleaning method, the typical engineering limits for Vdc are 3 to 12 volts. Limits for current density are 10 to 150 amps per square foot (ASF).

### Step 1: Identify the Transfer Equation

You cannot use an established textbook formula for this process, but you can set up a Response Surface DOE in Minitab to determine the transfer equation. Response surface DOEs are often used to optimize the response by finding the best settings for a "vital few" controllable factors.

In this case, the response will be the surface quality of parts after they have been cleaned.

To create a response surface experiment in Minitab, choose Stat > DOE > Response Surface > Create Response Surface Design.  Because we have two factors—voltage (Vdc) and current density (ASF)—we’ll select a two-factor central composite design, which has 13 runs. After Minitab creates your designed experiment, you need to perform your 13 experimental runs, collect the data, and record the surface roughness of the 13 finished parts. Minitab makes it easy to analyze the DOE results, reduce the model, and check assumptions using residual plots.  Using the final model and Minitab’s response optimizer, you can find the optimum settings for your variables.  In this case, you set volts to 7.74 and ASF to 77.8 to obtain a roughness value of 39.4.

The response surface DOE yields the following transfer equation for the Monte Carlo simulation:

Roughness = 957.8 − 189.4(Vdc) − 4.81(ASF) + 12.26(Vdc2) + 0.0309(ASF2)

### Step 2: Define the Input Parameters

Now you can set the parametric definitions for your Monte Carlo simulation inputs. (The standard deviations must be known or estimated based on existing process knowledge.) Volts are normally distributed with a mean of 7.74 Vdc and a standard deviation of 0.14 Vdc. Amps per Square Foot (ASF) are normally distributed with a mean of 77.8 ASF and a standard deviation of 3 ASF.

### Step 3: Create Random Data

With the parameters defined, it’s simple to create 100,000 rows of simulated data for our two inputs using Minitab’s Calc > Random Data > Normal dialog.

### Step 4: Simulate and Analyze Process Output

Now we can use the Calculator to enter our formula, followed by Stat > Basic Statistics > Graphical Summary. The summary shows that even though the underlying inputs were normally distributed, the distribution of the RMS roughness is non-normal. The summary also shows that the transmitted variation of all components results in a standard deviation of 0.521, and process knowledge indicates this is a good process result. Based on a DOE with just 13 runs, we can determine the reality of what will be seen in the process.

## Where Can You Apply the Monte Carlo Simulation?

The Monte Carlo method has come a long way since it revolutionized nuclear research in the 1940s. Today, using simulated data to develop a reliable parametric picture of a process’s outcome is a vital tool in industries including finance, manufacturing, oil and gas extraction, pharmaceuticals, and many more.

In nearly any situation for which you can develop a mathematical model, Minitab’s ability to create random simulated data gives you easy access to the power of the Monte Carlo simulation.