Sharing Ways to Illuminate Challenging Statistical Concepts

The best introductory statistics instructors know that merely memorizing how to perform procedures isn’t enough—students should understand what their results really mean. 

Over 15 years of teaching, Dr. Julie Belock, a professor at Salem State University, has developed a number of student projects that explore the meaning behind statistical techniques.  In a paper she presented at the 25th Annual International Conference on Technology in Collegiate Mathematics, Belock tackles three of the concepts students find most challenging by teaching with Minitab Statistical Software.

“I use Minitab software in these projects for its ease of use and production of excellent graphs, which aid the students in interpreting and presenting their work,” Belock writes.

Confidence Intervals

In statistics, we estimate the characteristics of populations by analyzing a subset of individuals, called a sample. But when you use sample data to estimate a population parameter like the mean, you’re very unlikely to match the true parameter exactly. A confidence interval is a range that is likely to contain that true value—so while you can’t provide the precise value, you can confidently say that the true mean falls within that range.

Confidence intervals are measured by their confidence levels, and this is where students often get confused. They frequently assume a 90% confidence interval has a 90% chance of including the true mean. But the confidence level actually indicates your chance of randomly selecting a sample whose confidence interval contains the true parameter. “Once an interval is computed from a particular sample, it either contains the true mean or it does not,” Belock explains. “There is no longer anything random about it!”

Belock’s students see this firsthand by using Minitab to simulate a large number of random samples and generating confidence intervals for each. When they calculate the percentage of confidence intervals that contain the true parameter, the students find that this percentage approximates the confidence level.

main effects plot

In the example above, students can see that each of these 90% confidence intervals either includes the true mean or does not, and that 17 out of 20—roughly 90%—do contain the true mean of 100.

P-Values

Belock uses a similar approach to illustrate the concept of the p-value, the probability of obtaining a result at least as extreme as the one in your sample data simply by chance. She provides an example in which 39% of sampled students say they are going directly to graduate school, where earlier data showed about 35% of all students went directly to grad school. Does this sample indicate the proportion of students going straight to grad school has increased? 

A 1-proportion Z-test will calculate the p-value of the 39% result from the sample. But first, Belock’s students approximate the p-value another way. They generate 100 simulated random samples drawn from a population where 35% of students go directly to grad school. Then they figure out what percentage of those samples result in values at least as high as 39%. When students compare this frequency to a p-value generated from the 1-proportion test, they find they have a close match—and a clear understanding of what the p-value represents.

Regression Diagnostics

A regression equation models the relationship between two quantitative variables. A scatterplot graphs the regression variables against each other, so you can visualize the nature of their relationship. A residual plot is a diagnostic tool for a regression analysis that lets determine how well the regression equation explains the relationship between the variables—an idea students often struggle with. 

A scatter plot shows a pattern if the data are associated. But a residual plot will not show a pattern if the regression model is a good fit. To see how this works, Belock’s students use real data about bears (from a set included in Minitab’s sample data folder) to create several scatter plots. Some of the plots show a strong linear correlation, while others don’t. When the students perform linear regression on data that are not linear, a “bad” residual plot results, indicating the regression is a poor fit. 

contour plot of dissolution

The regression shown in the fitted line plot (scatterplot) above is a poor fit, and yields a residual plot with a clearly curved pattern.

Then the students tweak their analysis to further refine the fit of the regression by using a quadratic model. As they do, any patterns in their residuals disappear and they end up with a randomly scattered residual plot that indicates a good fit.

contour plot of dissolution

A regression that fits the data well results in an unpatterned residual plot like the one above.

Conclusion

These exercises have proven helpful for students of all levels, Belock notes. The hands-on approach keeps students actively involved in the learning process, while using Minitab for the calculations and graphs frees students to focus on the concepts.

“Minitab works better than others for these particular activities due to several factors,” Belock writes, “including ease of use, clear graphics and appropriate options, such as the ability to generate and display multiple confidence intervals simultaneously.”

Belock’s lessons, with step-by-step instructions, are detailed in “Addressing Challenging Statistical Topics with Minitab,” a paper presented at the 25th Annual International Conference on Technology in Collegiate Mathematics.

By using this site you agree to the use of cookies for analytics and personalized content in accordance with our Policy.

OK