Identifying the Distribution of Your Data

To choose the right statistical analysis, you need to know the distribution of your data. Suppose that you want to assess the capability of your process. If you conduct an analysis that assumes that the data follow a normal distribution when, in fact, the data are nonnormal, your results will be inaccurate. To avoid this costly error, you must determine the distribution of your data.

So, how do you determine the distribution? Minitab’s new Individual Distribution Identification is a simple way to find the distribution of your data so that you can choose the appropriate statistical analysis. You can use it to:

  • Determine whether a distribution that you used previously is still valid for the current data
  • Choose the right distribution when you’re not sure which to use
  • Transform your data to follow a normal distribution

Three Ways to Use Individual Distribution Identification

To confirm that a certain distribution fits your data

In most cases, your process knowledge helps you to identify the distribution of your data. In these situations, you can use Individual Distribution Identification to confirm that this distribution fits the current data.

Suppose that you want to perform a capability analysis to ensure that the weight of ice cream containers from your production line meets specifications. In the past, these data have been normal, but you want to confirm normality. Here’s how you use Individual Distribution Identification to quickly assess the fit.

  1. Choose Stat > Quality Tools > Individual Distribution Identification.
  2. Specify the column of data to analyze and the distribution to check it against.
  3. Click OK.

A given distribution is a good fit if:

  • The data points roughly follow a straight line
  • The p-value is greater than 0.05

In this case, the ice cream weight data appear to follow a normal distribution, so you can justify the use of normal capability analysis.

To determine which distribution best fits your data

Suppose that you have successfully used more than one distribution in the past. You can use Individual Distribution Identification to help you decide which distribution best fits your current data. For example, you want to assess whether a particular weld strength meets customers’ requirements. A number of distributions have been used to model this type of data in the past. Here’s how you use Individual Distribution Identification to choose the distribution that best fits your data.

  1. Choose Stat > Quality Tools > Individual Distribution Identification.
  2. Specify the column of data to analyze and the distributions to check it against.
  3. Click OK.
weld strength

Choose the distribution with data points that roughly follow a straight line and with the highest p-value.

In this case, the Weibull distribution is a better fit than the others because the data points roughly follow a straight line and its p-value is the highest.

Note

When you fit your data with both a 2-parameter distribution and its 3-parameter counterpart, the latter often appears to be a better fit. However, you should use a 3-parameter distribution only if it is significantly better. See Minitab Help for information about choosing between a 2-parameter distribution and a 3-parameter distribution.

To use a normal statistical analysis on nonnormal data

While Minitab offers various options for analysis of nonnormal data, many users prefer to use the broader palette of normal statistical analyses. Minitab’s Individual Distribution Identification can transform your nonnormal data with the Box-Cox method so that it follows a normal distribution. You can then use the transformed data with any analysis that assumes that the data follow a normal distribution.

  1. Choose Stat > Quality Tools > Individual Distribution Identification.
  2. Specify the column of data to analyze, choose Box-Cox transformation, and check any other distributions to compare it with.
  3. Click OK in each dialog box.
box cox transformation

For the transformed data, check whether data points roughly follow a straight line and the p-value is greater than 0.05.

In this case, the probability plot and p-value suggest that the data are successfully transformed to follow a normal distribution. You can now use the transformed data for further analysis.

Note

Transformed data are not always normal data. You must check the probability plot and p-value to assess whether the normal distribution fits the transformed data well.

Putting Individual Distribution Identification to use

It is always a good practice to know the distribution of your data before analyzing them. Minitab’s Individual Distribution Identification is an easy-to-use tool that can help you to identify the distribution of your data as well as eliminate errors and wasted time that result from an inappropriate analysis. You can use this feature to check the fit of a single distribution, or you can use it to compare the fits of several distributions and select the one that fits best. If you prefer to work with normal data, you can even use Individual Distribution Identification to transform your nonnormal data to follow a normal distribution.