More than 90 inches of snow fell in Boston in the winter of 2015, but how rare was this occurrence? This article explores the rarity of the 2015 Boston snowfall amount in terms of Sigma Levels and examines University of Oklahoma meteorologist Sam Lillo’s estimate of the likelihood of this event occurring.
By Dr. Diane Evans, Six Sigma Black Belt and Professor of Engineering Management at Rose-Hulman Institute of Technology, and Thomas Foulkes, National Science Foundation Graduate Research Fellow in the Electrical and Computer Engineering Department at the University of Illinois at Urbana-Champaign
Dr. Diane Evans and Thomas Foulkes
The amount of snow that fell in Boston in winter 2015 was anything but “normal.” When the photo displayed in Figure 1 was taken, an overwhelming amount of 90 inches (7.5 feet) had fallen.
Figure 1. A woman shovels snow on Joy Street during a blizzard in Boston on February 15, 2015. [Taylor, 2015]
Following this historic snowfall of 94.4 inches in a 30-day period in 2015, Sam Lillo, a University of Oklahoma meteorologist, analyzed historical weather data from the Boston area from as far back as 1938 in order to determine the rarity of this event. Outlined in the resource Attachment A [Agency, 2015] and discussed in Section 2 of this paper, Lillo developed a simulated set of one million hypothetical Boston winters by sampling with replacement snowfall amounts gathered over 30-day periods. Eric Holthaus, a journalist with The Washington Post, reported that Lillo’s results indicated that winters like the 30 days of consecutive snowfall from January 24 to February 22, 2015 should “only occur approximately once every 26,315 years” in Boston, as displayed in Figure 2. [Holthaus, 2015] In Six Sigma terminology, the likelihood of this 30-day snowfall amount occurring is 38 out of a million.
Figure 2. Sam Lillo’s simulated 30-day Boston snowfall amounts. The previous record of 58.8 inches was set in 1978. [Holthaus, 2015] The simulated (not actual) record using this hypothetical data is 119.7 inches.
As Six Sigma practitioners, this article was appealing to us because of the suggested rarity of this event. Since the 1980’s, industries across the world (e.g., manufacturing, healthcare, management, academia) have been measuring the quality of processes and products based on Six Sigma methodology. [Deming, 1986] In simple terms, when a company makes a part for a product, it needs to meet customer specifications. If a part does not meet the specifications for its intended use, then the part is considered defective. In order to reduce the number of defective parts, the company’s goal is to center a part’s dimension on the targeted mean and to reduce its spread or standard deviation about the mean. In doing so, parts that do not meet specifications, i.e., defective parts, are rare events.
The Sigma Level of a process indicates how well it meets its set specification limits. As is well-known in Six Sigma methodology, a process with Sigma Level 6σ is expected to have only 3.4 defects per million (DPM). In other words, only 3.4 process parts out of a million are expected to be beyond its closest specification limit. It is important to note that the 3.4 DPM is calculated assuming that the process is normally distributed and its mean shifts 1.5 standard deviations over time. If the process is normally distributed without this shift, then the process has a “long term” Sigma Level of 7.5σ. If a process is not normally distributed and does not shift, like Boston snowfall amounts, then its Sigma Level is the number of standard deviations from its mean to its closest specification limit.
The purpose of this paper is to explore the rarity of the 2015 Boston snowfall amount in terms of Sigma Levels and to examine Lillo’s estimate of the likelihood of this event occurring. To assess Lillo’s findings, we obtained snowfall amounts in a specified Boston location from 1891 to 2015 via the National Oceanic and Atmospheric Administration (NOAA) for comparison with his simulated data.
On March 15, 2015, the cumulative Boston snowfall of 108.6 inches surpassed the previous Boston record of 107.6 inches set in the winter of 1996. In Figure 2, a graphical display of Boston snow statistics from 1938 to 2015 illustrates the quick rise in snowfall amounts in 2015 as compared to record setting snowfalls in years 1996, 1994, and 1948. Included in the figure is the annual average Boston snowfall through early June. The final tally on Boston’s brutal snowfall in 2015 clocked in at 110 inches. [Basu, 2015] The condensed timeframe in which this historic and devastating snow accumulated created a public safety crisis, resulting in dozens of deaths and hundreds of injuries and causing tens of millions of dollars in destruction. [Agency, 2015]
Figure 3. Graphical display of Boston’s snow statistics since 1938 [@NWSBoston, 2015]
The dashed rectangular region inserted in the graphic highlights the 30 days of snowfall from January 24 to February 22, 2015 resulting in 94.4 inches of snow. In order to obtain hypothetical 30-day Boston snowfall amounts, Lillo first generated one million resampled winters by:
... stitching together days sampled from past winters. A three-day period was chosen, to represent the typical timescale of synoptic weather systems. In addition, to account for the effect of long-term pattern forcing, the random selection of 3-day periods was weighted by the correlation between consecutive periods. Anomalies tended to persist across multiple periods, such that there’s a better chance that a snowier than normal three days would follow a similarly snowy three days. This is well observed (and in extreme cases, like this year), so it’s important to include in the simulation. [Agency, 2015]
After generating the one million resampled winters, Lillo recorded the snowiest 10-period stretches, i.e., 30 days, from each winter. Percentile ranges of the resampled distribution were compared to the distribution of observed winters to check the validity of the simulated data. In simulating the winters’ snowfalls in this manner, Lillo had to assume that consecutive winters and winter snow patterns within a particular year were independent and identically distributed (IID). We recognize that these assumptions are not necessarily valid.
Since we were unable to obtain Lillo’s simulated data and are using actual historical data for our own Sigma Level calculations, we used a digitizer and Figure 2 to simply create a “copy” of his data for further analysis. Using Engauge Digitizer, an open source program, a blue cross was placed at the top center of each vertical bar of the distribution, as shown in Figure 4. Each blue cross was converted into an (x, y) coordinate pair and stored in an Excel spreadsheet. The digitized x coordinates correspond to the x-axis values on the given plot since only one blue cross was associated with each vertical bar. The digitized y coordinates were first normalized* against the plot’s peak or maximum value. Then, these normalized values were multiplied by 40,000 in order to scale the curve to match Figure 2. The digitized plot of the normalized data is also displayed in Figure 4.
Normalizing in this sense rescales data to the unit interval.
Figure 4. Digitizing Lillo’s data with Engauge Digitizer (left) and the resulting digitized plot (right).
Data values for “Maximum 30-day snowfall (in)” and “Number of winters” from the Excel worksheet were pasted into two columns in Minitab. Histograms of the snowfall amounts with overlaid probability plots were constructed to offer reasonable distributions to fit the data. The following Minitab commands were used to construct the histograms in Figure 5.
Figure 5. The gamma and largest extreme value distributions are fit to the maximum 30-day snowfall amounts. Note that the sample size N is not one million since the digitized data is not an exact replica of Lillo’s one million data points.
The gamma distribution provides the best visual fit to the maximum 30-day snowfall amounts. To statistically identify the best fit, we used Minitab’s Individual Distribution Identification. By default, an Anderson-Darling (AD) test for goodness-of-fit is performed and the numerical results are displayed with the graph. The Anderson-Darling procedure is a general test to compare the fit of an observed cumulative distribution function to an expected cumulative distribution function. This test gives more weight to the tails than the Kolmogorov-Smirnov test, which is important because of the rare 2015 snowfall amount.
Minitab could not construct probability plots of the data given the very large sample size of nearly one million values. We scaled the frequency of the snowfalls to ten thousand data points using their probability of occurrence. A column of the ten thousand snowfalls called “Scaled Snowfall Amounts” was created and used to obtain probability plots for various distributions. The following Minitab commands were used to obtain the AD test statistic, p-value, likelihood ratio test p-value for each distribution, as displayed in Figure 6, and probability plots in Figure 7.
Figure 6. Minitab’s goodness-of-fit test results, where AD is the value of the Anderson-Darling test statistic for a distribution, P is the p-value corresponding to the distribution, and LRT P is the likelihood ratio test p-value. An asterisk indicates that a value cannot be calculated.
For every 3-parameter distribution, except the Weibull distribution, there is no established method for calculating its p-value, so the likelihood-ratio test (LRT) must be used. A likelihood ratio test p-value (LRT P) less than a significance level α indicates that, for distributions with this extra parameter, adding it significantly improves the distribution’s fit. For example, the LRT p-value 0 for the 3-parameter gamma distribution indicates that the gamma distribution with three parameters is a much better fit to the data than the gamma distribution with only two parameters. Unfortunately, adding the third parameter to the gamma distribution shifts it slightly left of x = 0. Although the shift is minor (approximately 1.262 units), it does have an effect on the right tail probability, which is the main computational interest of this paper.
The extremely large sample size of the scaled data yielded p-values less than the typical significance level of α = 0.05 for all of the distributions. Although overly large sample sizes give distribution tests very high power, small deviations or departures from a distribution are considered statistically significant, resulting in miniscule p-values. In these large sample size cases, we should not consider the p-values to be practically significant. Patrick Runkel and Jim Frost advise in their blog posts
Large Samples: Too Much of a Good Thing? and
How to Identify the Distribution of Your Data using Minitab, respectively, to disregard the p-values obtained from the AD goodness-of-fit test and compare only AD values for the distributions. [Runkel, 2012], [Frost, 2012] Lower AD values indicate better fits. A visual inspection of probability plots combined with Anderson-Darling values can help indicate whether a distribution is a good fit. The probability plots for several distributions are displayed in Figure 7.
Figure 7. Probability plots of the scaled snowfall amounts for the normal, gamma, and largest extreme value distributions. Clearly the normal distribution provides the worst fit among the three distributions.
Of the fourteen distributions provided by Minitab to fit the snowfall data, the gamma distribution was found to have the best fit according to the probability plots and AD test statistics for modeling only positive x values.
Let X represent the gamma random variable that best fits the maximum 2015 30-day snowfall data. A gamma random variable X with positive shape parameter α and positive scale parameter β has probability density function:
with expected value and variance:
The parameters for the distribution, shape α ≅ 4.214 and scale β ≅ 5.382, were computed by Minitab’s Individual Distribution Identification procedure. Using the gamma distribution, the probability of a 30-day snowfall greater than 94.4 inches is:
This computation can be easily done in Minitab as well by using the following commands:
The resulting graphic is displayed in Figure 8.
Figure 8. The probability of the maximum 30-day snowfall of over 94.4 inches using the best fit gamma distribution and Lillo’s simulated snowfalls.
In simulating the million hypothetical Bostonian winters, Lillo made the simplifying assumption that the Boston climate is static. [Holthaus, 2015] Assuming a static climate implies that the mean number of inches of snow received in 30-day periods each winter does not shift over time. While we do not necessarily agree with this, we made the same assumption in order to compare our results with Lillo’s. Thus, in calculating the Sigma Level for the 2015 30-day snowfall amounts, we calculated the event’s long term Sigma Level, as discussed in Section 1. The Sigma Level for the best fit gamma distribution with approximate mean E (X) = αβ ≅ 4.214 ∙ 5.382 ≅ 22.680 and varianceVar (X) = αβ2 ≅ 4.214 ∙ 5.3822 ≅ 122.062 is:
A Long Term Sigma Level of 6.49σ is equivalent to a Short Term Sigma Level of 4.99σ given the 1.5σ shift in the mean. Although the 2015 Boston snowfall is not a true 6σ event according to Lillo’s simulated data, the likelihood of Boston experiencing this rare amount of snow in the next twenty-five centuries is near zero given its probability of occurrence of 3.675 ∙ 10-5.
Instead of creating and analyzing simulated snowfalls as Lillo did, we used historical Boston snowfall data to determine the rarity of the 2015 winter event. Seasonal snowfall data was obtained for two Weather Service Forecast Offices (WFSO) from the National Oceanic and Atmospheric Administration (NOAA): Boston WSFO AP and Boston Logan Airport. NOAA defines one snowfall season as all snow received from July through the following June. Thus, the 2014-2015 snowfall season, which we’ll refer to as the 2015 season, includes monthly records from July 2014 to June 2015. While the Boston WSFO AP station has snowfall amounts dating back to 1891, it does not have snowfall data available for this station after 2012. The snowfall records for the Boston Logan Airport station, which start in 1936, are in strong agreement with the Boston WSFO AP station and provide measurements for the 2013, 2014, and 2015 snowfall seasons. By appending these three snowfall seasonal records from the Boston Logan Airport station to the records from the Boston WSFO AP station, a total of 125 years of historical Boston seasonal snowfalls were gathered.
Unfortunately, the NOAA does not have daily snowfall totals for each year, only monthly totals. Because the specific snowfall amounts for the time period between January 24 and February 22 could not be isolated, we combined the daily snowfall amounts for the entire months of January and February. Although we cannot directly compare our results to Lillo’s, we can still study the rarity of the amount of snow received in these two months in 2015 compared to years dating back to 1891. A boxplot and histogram of the two month totals from 1891 to 2015 is displayed in Figure 9.
Figure 9. January and February Boston snowfall totals for years 1891 to 2015. The lone right tail outlier (99.1 in) is year 2015.
For the 125 snowfalls, we again used Minitab’s Individual Distribution Identification procedure to compute the Anderson-Darling goodness-of-fit test statistics and their respective p-values and likelihood ratio test p-values for a variety of distributions, as displayed in Figure 10. Although the largest extreme value, gamma, and 3-parameter loglogistic distributions have p-values greater than 0.250, both the largest extreme value and 3-parameter loglogistic have positive probabilities for x’s less than 0.
Figure 10. The largest extreme value, gamma, and 3-parameter loglogistic distributions are the best fits to January and February Boston historical snowfalls.
Figure 11 displays the gamma distribution with shape parameter α = 3.254 and scale parameter β = 7.963 that Minitab identified as the best reasonable fit for the historic Boston data. The parameters were determined by the Individual Distribution Identification procedure. By including all of January and February, this snowfall total as compared to the 30-day total increased from 94.4 inches to 99.1 inches, which is only 4.7 inches. Using the best fit gamma distribution, the probability of snowfall greater than 99.1 inches occurring is approximately 542 out of a million. The Long Term Sigma Level is 5.1σ, while the Short Term Sigma Level is 3.6σ.
Figure 11. The probability of the maximum 60-day snowfall of over 99.1 inches using the best fit gamma distribution and historical Boston snowfalls.
Recall that using Lillo’s simulated data, the probability of having a 30-day snowfall was 37 out of a million. Expanding the time period to all of January and February (59 or 60 days accounting for leap years) and using historical rather than simulated data increased the probability to 542 out of a million. One obvious reason for the difference is the rarity of the 2015 30-day total as compared to the 2015 60-day total as displayed in Figure 12. The figure shows that the winter of 2015 does not hold the record for the most snow in any 1, 2, or 3 day period; however, it holds the record for all other periods of time. [Letham, 2015] In 2015, Boston got 94.4 inches of snow in a 30-day time period, compared to the previous record (set in 1978) of approximately 59 inches. In a 60-day time period, the difference in snowfall totals between 2015 and the all-time record is not as large, as displayed by the inserted dotted lines in the figure. Also, the maximum 60-day total snowfall in any year may have occurred over a 60-day interval that does not start on January 1 and end on February 28 or 29. With respect to the simulated data, the difference in rarity calculations for the simulated versus historical data is likely the questionable IID assumption in creating the simulated snowfalls, as well as the assumption of weather as static. However, even when historical data is used to calculate the rarity of the 2015 Boston snowfall, the likelihood of 542 out of a million is still a negligible amount!
Figure 12. The total snowfall amounts for 2015, 1978, and “all-time records” winters for the number of days n varying from 1 to the entire winter. The all-time records line is from six previous winters: 1978, 1994, 1996, 2003, 2005, and 2011 using data that goes back to 1893. [Letham, 2015]
On February 17, 2015, the website FiveThirtyEight, which uses statistics to tell compelling stories, posted a time series-like chart of February snowfalls in Boston since 1891. [Enten, 2015] The chart, as displayed in Figure 13, provides a chronological view of the variability in snowfalls since that time. In building probability models and calculating Sigma Levels in Sections 4 and 5, we assumed that the conditions that resulted in the more typical snowfalls were the same as the conditions that resulted in the more extreme events, such as in 2003, 2005, 2011, and 2015. This assumption is not necessarily true; in fact, it rarely is. When data are not homogeneous, the fitted model will be incorrect and our predictions will be erroneous. [Wheeler, 2013]
Figure 13. Time-series like chart of historical February snowfalls in Boston since 1891. [Enten, 2015] Winter 2015’s amount (over 58 inches) only accounts for the snowfall total up to February 17, 2015.
The results calculated in this paper assume homogeneous annual snowfalls. Under that assumption, the likelihood of the debilitating snowfall that occurred in Boston in 2015 is at best 542 out of a million and at worst 38 out of a million. The goals of this paper were to confirm Lillo’s results and calculate similar snowfall likelihoods using historical data, which we did. Our future work is to determine whether the Boston historical snowfall data is justly homogeneous. Walter Shewhart, who was the inventor of the control chart, said the following about data presentation: “Whenever an average, range, or histogram is used to summarize the data, the summary should not mislead the user into taking any action that the user would not take if the data were presented as a time series.” [Wheeler, 2000] Our follow-up work will be to examine the Boston snowfalls over time to look for patterns in the historical data that may have had an effect on this unusual weather event. With recent news coverage on El Nino and global warming, control charts may be able to provide insight into regional climate change.
@NWSBoston. (2015, March 15). Retrieved from
Agency, M. E. (2015, March 27). Attachment A: 2015 Severe Winter Weather Pattern Impacts - Supplemental Information. Massachusetts, USA. Retrieved from
Basu, T. (2015, July 14). The Last Snow from Boston’s Nightmare Winter Has Finally Melted. TIME. Retrieved from
Deming, W. E. (1986). Out of the Crisis: Quality, Productivity and Competitive Position. Cambridge: Cambridge University Press.
Enten, H. (2015, February 17). Boston’s Ridiculous February Snowfall in One Chart. Retrieved from
Erdman, J. (2015, March 23). weather.com. (The Weather Channel) Retrieved from
Frost, J. (2012, March 8). How to Identify the Distribution of Your Data using Minitab. (blog.minitab.com) Retrieved from
Holthaus, E. (2015, February 25). Boston’s astounding month of snow a 1-in-26,315 year occurrence. The Washington Post. Retrieved from
Jędrzejewski-Szmek, Z., Muftakhidinov, B., Winchen , T., Trande, A., Lane, D., & Weingrill, J. (n.d.). Engauge Digitizer Tool (Version 6.0). Retrieved from
Letham, B. (2015, March 15). Was 2015 Boston’s worst winter yet? Boston. Retrieved from
Lillo, S. (2016, January 14). Personal Communication. University of Oklahoma Meteorologist.
Runkel, P. (2012, June 4). Large Samples: Too Much of a Good Thing. Retrieved from
Taylor, A. (author of article), Brian Snyder/Reuters (photographer), (2015, February 17). What Record-Breaking Snow Really Looks Like. Retrieved from
Wheeler, D. J. (2013, June 4). Why We Keep Having 100-Year Floods. Quality Digest. Retrieved from
Wheeler, D. J. (2000). Understanding Variation: The Key to Managing Chaos, 2nd Edition. SPC Press.
Get our free monthly e-newsletter for the latest Minitab news, tutorials, case studies, statistics tips and other helpful information.
Data is the new gold: 5 ways to make sure your data is reliable
Advancing the Power of Analytics
A Statistical Analysis of Boston’s 2015 Record Snowfall