This article provides background, use cases, and technical information about the implementation of the G chart developed by James Benneyan in Minitab Statistical Software.
Developed by James Benneyan in 1991, the g chart (or “G chart” in Minitab) is a control chart that is based on the geometric distribution. Benneyan has since published several papers about the G chart and a companion chart, the h chart. (Health Care Management Science, Vol 4, pages 305-318, 2001, is the article used as the basis for Minitab’s G chart.) The majority of applications cited in these papers are for monitoring infection rates in healthcare, such as nosocomial infections. Nosocomial infections are infections that occur as a direct result of a patient’s treatment in a medical facility.
P charts and U charts are often used to monitor adverse events such as nosocomial infections. But P charts and U charts require very large quantities of data and specific definitions of the data. For example, if you use a U chart to monitor nosocomial infections, each patient day is considered an area of opportunity in which one or more infections could occur. Thus, the data are the number of infections per patient day. If you use a P chart, the data are the number of patient days in which one or more infections occur. As for the data requirements, if you follow the standard practice of requiring a minimum of 25 to 35 subgroups to establish control limits, and the infection rate is low (for example, < 1%), the required amount of data is at least 12,500 patients (500 patients per subgroup multiplied by 25 subgroups). This means that it could take weeks, months, or perhaps even years to accumulate enough data to detect and respond to changes in the infection rate.
The geometric distribution provides an alternative probability model. In the geometric distribution, you count the number of opportunities before or until the defect occurs. Thus, in a healthcare setting where you monitor the infection rate, the ideal would be to count the number of patients or procedures until an infection is observed. While this is the ideal, it is also rarely done, because of complications with counting the actual number of patients through the system, or the number of procedures. What is most often done is to count the number of days between observed infections. The key assumption used when counting the number of days is that the number of patients or procedures per day is fairly constant.
As mentioned earlier, most of the applications cited in Benneyan's paper are from healthcare settings. But the G chart is appropriate for processes in which the defect rate is very low and for processes that show a natural geometrically decaying pattern. Benneyan [1] cites several examples of this natural decaying pattern: the number of re-worked welds per manufactured item, the number of detected software bugs, the number of items on delivery trucks, and the number of invoices received per day.
Like other control charts, the G chart has a center line and upper and lower control limits. The calculations for the control limits almost always result in a negative lower control limit. When the calculated lower control limit is negative, the lower limit is set to 0.
The actual data that are plotted on the chart are the number of opportunities between defects. In health care settings, opportunities are typically defined as days. This makes interpreting the G chart unusual, because, if the infection rate increases, the number of days between infections is reduced to as low as 0, if infections occur on the same day. At the same time, if the rate decreases, the number of days between infections increases. Thus, a point that is above the upper control limit indicates an unusually long period of time between adverse events. In other words, the rate is unusually low. Therefore, the upper control limit is often used as an indication that a significant improvement has been made.
One problem with the G chart is that you usually cannot obtain points that are below the lower control limit, because the lower control limit is set at 0 and the minimum data value is also 0. Thus, while you can observe a signal that is above the upper control limit, which indicates that the adverse event rate was unusually low, you cannot detect when the adverse event rate is unusually high. In a practical sense, you want to study an unusually low rate to see what you were doing right during that time period. But you also want to respond as soon as possible to an unusually high rate to determine the cause for the increase in the rate. Thus, the G chart, using only Test 1 (1 point outside the control limit), does not provide adequate detection of the change in the adverse event rate that is of the most concern.
Benneyan [2] discusses several solutions to this problem. One solution is to use the additional tests: Test 2, Test 3, and Test 4.The most logical choice is Test 2. Nine points in a row below the center line are probably an indication that the infection rate has increased. Minitab recommends always using Test 1, which is turned on by default, and Test 2, which is optional. Test 3 and Test 4 are also optional.
A second solution is the “Benneyan test” and is described in Benneyan [2]. Points that fail the Benneyan test are marked on the chart with a capital “B”. This Benneyan test counts the number of consecutive plot points that are equal to 0. The number of points that are required to trigger a signal is a function of the desired false alarm rate and the process p. Minitab bases the false alarm rate on the probability for the Test 1 argument. For example, if the Test 1 argument is 3 (the default), then the probability of being above the upper control limit is <= 0.0013499. Minitab uses this as the alpha value in the Benneyan Test, which makes the probability of a signal at 0 <= 0.0013499. The Benneyan test is also turned on by default.
Benneyan [2] describes the power and Average Run Length (ARL) of the G chart for detecting changes in the adverse event rate using: Test 1; Test 1 and Test 2; and Test 1 and the Benneyan Test. Test 1, by itself, has adequate power to detect a decrease in the adverse event rate. But Test 1, by itself, has 0 power to detect an increase in the adverse event rate. Using Test 1 and the Benneyan test provides the most significant improvement in the power to detect increases in the adverse event rate.
Another issue with the common implementation of the G chart is that it uses the standard method of constructing control charts, where the center line is the mean of the data and the control limits are set at ± 3 standard deviations from the center line. The geometric distribution is highly skewed (see Figure 1), so the result of the ± 3 standard deviation method is that the lower control limit is too low (This is unlikely, because the lower control limit is usually set to 0.) and the upper control limit is not high enough. An upper control limit that is not high enough causes a high false alarm rate at the upper control limit (see Figure 2). The false alarm rate for the upper control limit (see Figure 2) is 0.01825. This value is more than 13 times as large as the rate for a chart that is based on a normal distribution, which is 0.0013499.
Another solution proposed by Benneyan [2] is to use probability limits. The advantage of using probability limits, as described in Benneyan [2], is a dramatic reduction in the false alarm rate for the upper control limit, which is the limit that signals a reduction in the adverse event rate. Minitab uses the probability limit method for determining the control limits. The probability of having a point that is outside either control limit is set to correspond to the probability of having a point that is outside one of the control limits for a standard control chart based on the normal distribution, such as an I chart or Xbar chart. Using the usual 3 standard deviation limits in an I chart or an Xbar chart, the probability of having a point that is outside either the upper control limit or the lower control limit is 0.0013499. Minitab uses this probability to define the upper and lower probability limits for the G chart, as a default. The lower control limit is set at the 0.0013499 percentile of the geometric distribution. The upper control limit is set at the 0.99865 percentile. The center line is set at the 0.5 percentile, also called the median.
Figure 1. PDF for Geometric Distributions with varying p
Figure 2. Probability of X > mean + 3 standard deviations
Data entered is one of two types:
Note: Both data types result in a minimum value of 0, and both result in a lower control limit of 0 in all but the most unusual instances (p < 0.00135).
Xi = plot points, as explained above
Xbar = average of Xi
N = number of data values used in the calculations (if data are dates, subtract 1 since we use differences and there is no difference for the first event)
phat = estimate of adverse event rate (that is, the probability that an event occurs in a specified interval, such as day). If a historical estimate is supplied, then that value is used. Otherwise, phat is calculated from the data as follows:
phat = ((N-1)/N)/(Xbar + 1)
p1 = 0.00135p2 = 0.5p3 = 0.99865 = 1 – p1
LCL = invcdf(p1) using geometric distribution with parameter phatCL = invcdf(p2) using geometric distribution with parameter phatUCL = invcdf(p3) using geometric distribution with parameter phat
Test 1: 1 point outside percentiles defined above Test 2: K points in a row on one side of the center lineTest 3: K points in a row, all increasing or decreasingTest 4: K points in a row, alternating up and downIn Test 1, if the argument, K, is 3, then the p1 and p2 values defined above are used to obtain the lower control limit and upper control limit. If the Test 1 argument, K, is not equal to 3, then define p1’ and p2’ as the CDF values of Normal(0,1) for –K and +K and obtain the lower control limit and the upper control limit using p1’ and p2’.
If the lower control limit is 0, the G chart cannot detect an increase in the adverse event rate because there can be no points below the lower control limit. Therefore, Minitab applies an additional test for detecting an increase in the adverse event rate. This test is outlined in Benneyan [2] and Benneyan [3].The Benneyan test uses the following method to determine, cp, the number of consecutive points equal to 0 that are required to generate a signal:cp = ln(CDF Normal (0,1) (-K))/(ln(phat))Note: Calculate the ratio above and round up to the next integer.Note: CDF Normal (0,1) (-K) is the CDF for a normal distribution with mean 0, stdev 1, evaluated at –K, where K is the Test 1 argument.Each consecutive point in a run of points that are equal to 0, where the run is of length >= K, is marked on the chart with the symbol “B”. For example, if cp is equal to 4 and there are 5 points in a row that are equal to 0, then the 4th and 5th points in the run are marked with the symbol “B” on the chart.
Prepared by Dr. Terry Ziemer, SIXSIGMA Intelligence
Download this article as a PDF
Get our free monthly e-newsletter for the latest Minitab news, tutorials, case studies, statistics tips and other helpful information.
Data is the new gold: 5 ways to make sure your data is reliable
Advancing the Power of Analytics
A Statistical Analysis of Boston’s 2015 Record Snowfall
By using this site you agree to the use of cookies for analytics and personalized content in accordance with our Policy.