Playing Games with a Purpose: Teaching Two-Sample Hypothesis Tests

We usually think of games as a pleasant distraction—just something we do for fun. However, growing evidence suggests that games can do more than keep us entertained, especially when it comes to learning in a classroom setting.

Because statistics is a topic that doesn’t come easily to most and is often not thought of as being fun, using properly designed games to teach statistics can become a valuable tool to spark interest and help explain difficult concepts.

What kinds of “properly designed” games are we talking about? Not traditional board games like Monopoly or Chutes and Ladders, but interactive computer games—the types of games younger generations have grown up with.

Dr. Shonda Kuiper, associate professor and chair of the mathematics and statistics department at Grinnell College, Kevin Cummiskey, assistant professor at the United States Military Academy, and Colonel Rod Sturdivant, associate and academy professor at the United States Military Academy, have been exploring the use of computer games in their classrooms for many years.

“We shifted the focus away from statistical calculations that aren’t tied to the context of scientific research,” says Kuiper. “Our materials provide an alternative to lectures and textbook style problems, and incorporate research-like experiences in the classroom.”

To incorporate research-like experiences into their instruction, Kuiper, Cummiskey, and Sturdivant had students use game-based labs. “The labs leverage students’ natural curiosity and desire to explain the world around them, so they can experience both the power and limitations of statistical analysis,” says Kuiper.

For example, the trio used the online computer game “Tangrams” to teach hypothesis testing, and had students use statistical software like Minitab to analyze data along the way.

Defining Hypothesis Testing

Of the many topics taught to students in introductory-level statistics courses, hypothesis testing is among the most challenging to understand at the conceptual level.

Hypothesis tests are statistical procedures that evaluate two mutually exclusive statements about a population. These two statements are called the null hypothesis and the alternative hypothesis. They are always statements about population attributes, such as the value of a parameter, the difference between corresponding parameters of multiple populations, or the type of distribution that best describes the population. A hypothesis test uses sample data to determine which statement is best supported by the data.

Examples of questions you can answer with a hypothesis test include:

  • Is the average time to complete a Tangrams game less than 2 minutes?
  • Is the average completion time of a game different for males and females?
  • Do science and engineering majors complete games more quickly than other majors?

Most hypothesis tests in Minitab are located in the Stat > Basic Statistics menu, although some, like the chi-square test, are located in Stat > Tables > Cross Tabulation and Chi-Square.

Teaching Hypothesis Testing with “Tangrams”

In the Tangrams lab developed by Kuiper, Cummiskey and Sturdivant, students are introduced to hypothesis testing through a web-based puzzle game. Players must solve a puzzle in which they cover an image by flipping, rotating, and moving a set of shapes.

Tangrams 1

The web interface of the Tangrams puzzle game.

The Tangrams website collects each player’s information and automatically records their completion times. The students can download the data set for the entire class, which is available for immediate use through the website.

Students take on the role of a researcher by selecting from a wide variety of independent variables to explain why some students complete the game faster than others. For example, a student may decide to investigate whether game completion times differ based on the type of music played in the background, and then translate this research question into a testable hypothesis.

Next, students can analyze their data by calculating summary statistics and plotting histograms of the Tangrams completion times in statistical software such as Minitab. Because completion times tend to vary significantly among the students, the data sets tend to be “messy,” and do not follow a normal distribution.

This makes the analysis engaging for students, because they must discuss and make decisions about data cleaning, such as whether to remove outliers. Then they must check assumptions, conduct appropriate statistical significance tests, and state their conclusions. “Many statistics courses discuss model assumptions and removing outliers or erroneous data,” Kuiper says, “but students rarely face data analysis challenges where they must make and defend their own decisions.”

Application in the Classroom

To illustrate how to implement the Tangrams lab in the classroom, we will consider a class that chooses to investigate the relationship between a student’s academic major and the time it takes to complete the puzzle. Specifically, the class wants to answer the following research question: Are students who major in math, science, and engineering faster at completing the puzzle than students majoring in other subjects?

Prior to starting the game, the players enter pertinent data about themselves into the Tangrams web interface. For this example, students entered type of major, either “MSE” for math, science, engineering majors, or “Other” for all other majors. 

Tangrams 2

Students input pertinent data about themselves using the web interface of the Tangrams game.

After each student plays the game, their data is matched with their puzzle completion time. When the last student completes the puzzle, the class’s data is immediately available for analysis.

Before delving into data analysis, the students need to translate the research question into testable hypotheses. In this case, they want to see if the difference between the means of two populations—MSE majors and other majors—is statistically significant.

The null (H0) and alternative (Ha) hypotheses would be:

H0: MSE majors have the same Tangrams average completion time as students in other majors.

Ha: MSE majors and other majors do not have the same Tangrams average completion time.

Now the students input their data into a Minitab worksheet to calculate basic summary statistics.

Tangrams 3

Students can input class data from the Tangrams lab into a Minitab worksheet.

In Minitab, students use Stat > Basic Statistics > Display Descriptive Statistics to identify the sample mean and standard deviation of the completion times of the MSE majors and other majors. They can also select to view other summary statistics, such as median, mode, variance, and many others.

Tangrams 4

Minitab’s Display Descriptive Statistics function shows the sample mean and standard deviation of the completion times of the 96 MSE majors and 32 other majors that played Tangrams.

To view the distribution of the data, students use Graph > Histogram > Simple to create a histogram:

Tangrams Histogram

This histogram makes it easy to see the distribution of the completion times for other majors, including the high and low times, as well as the mean completion time.

The students also use Graph > Boxplot > Multiple Y’s > Simple to view the data distribution for both populations and to easily identify outliers.

Next, to determine if there is a statistical difference between the means of MSE majors and other majors, the students conduct a two sample hypothesis test in Minitab. Because there are two independent populations and the students want to determine if the average completion times are the same, they should choose a two-sample t-test (Stat > Basic Statistics > 2-Sample t) to compute the p-value.


In this case, for type of major, the p-value for the two-sample t-test was 0.26, which is not significant at the 95% confidence level (α=0.05). 

Therefore, the class would fail to reject the null hypothesis and conclude that there is no significant difference between the two population means.

The results of the hypothesis test will likely surprise students, who may note that the average completion times for MSE majors is 22% faster than the other majors. This seems to imply that MSE majors outperformed other majors. However, students would be ignoring the large standard deviation in the completion times, which decreases overall confidence in the location of the population means.

Following the hypothesis test, students can validate the basic assumptions of the t-test.  One important assumption is that the sample of students participating in the research is a random sample. This exposes students to the challenges researchers come across when conducting experiments. In practice, obtaining a random sample is difficult. There are many reasons why the sample for this classroom example is not random. In this case, only four sections of this particular statistics class participated.

Reactions from Students

So what do students think about this approach to learning statistics?

Many of Cummiskey’s students responded very favorably to the game-based lab, commenting that they liked being involved in the data collection process because it made the data “real” to them.

“As a group, students enjoyed playing the games,” says Cummiskey. “The labs seemed to truly engage students and many commented that they saw how statistical procedures are actually used by people outside the statistics classroom.”

Through the National Science Foundation supported grants, NSF DUE #0510392 and NSF DUE #1043814, Kuiper and others developed materials that can be used as projects within an introductory statistics course or to synthesize key elements learned throughout a secondary statistics course.

The materials can be used to form the basis of an individual research project and to help students and researchers in other disciplines to better understand how statisticians approach the scientific process.

Sample materials and datasets, including those for the Tangrams lab discussed in this article, are freely available at

Read more about Cummiskey, Kuiper, and Sturdivant’s research in the paper, “Using classroom data to teach students about data cleaning and testing assumptions,” Frontiers in Quantitative Psychology and Measurement, September 2012. The paper can be downloaded for free at:

By using this site you agree to the use of cookies for analytics and personalized content in accordance with our Policy.