This relatively simple question needs to be answered when you make data-driven decisions. But many of us forget to ask it, or respond too quickly and confidently.
Assuming that your data is good is not enough, you need to be sure that it is reliable. That may require a little bit more work up front, but energy invested in getting good data pays off in better decisions, bigger improvements, greater confidence and, last but not least, savings to the bottom line.
This situation represents a massive amount of new opportunity, but it also brings some significant challenges.
Here are five ways you can ensure confidence in your data.
Failing to plan is a surefire way to get unreliable data. That’s because a solid plan is the key to successful data collection.
Asking why you’re gathering data at the very start of a project will help you pinpoint the data you really need.
A data collection plan should answer these questions:
Having these answers in advance will help you on your way to getting meaningful data.
A thorough data collection plan is a great first step to getting reliable data.
Many quality improvement projects require measurement data for factors like weight, diameter, length and width. Not verifying the accuracy of your measurements practically guarantees that your data—and therefore your results—are not reliable.
A branch of statistics called Measurement System Analysis lets you quickly assess and improve your measurement system so you can be sure you’re collecting data that is accurate and precise.
When gathering quantitative data, Gage Repeatability and Reproducibility (R&R) analysis confirms that instruments and operators are measuring parts consistently.
If you’re grading parts or identifying defects, an Attribute Agreement Analysis verifies that different evaluators are making judgments consistent with each other and with established standards.
If you do not examine your measurement system, you’re much more likely to add variation and inconsistency to your data that can wind up clouding your analysis.
As you collect data, be careful to avoid introducing unintended and unaccounted-for variables. These “lurking” variables can make even the most carefully collected data unreliable—and such hidden factors often are insidiously difficult to detect.
Say your hypothesis is that lack of exercise leads to weight gain. One hundred men and one hundred women volunteer to assess whether this hypothesis is correct. Suppose you do not have any control variable such as the use of placebos, or random assignment to groups. Then you can not say whether lack of exercise leads to weight gain. Possible confounding variables are starting weight, occupation or age. For example, if all of the women in the study were middle-aged, and all of the men were teenagers, age would have a direct effect on weight gain. That makes age a confounding variable. Confounding variables can lead to bias.
Suppose that data for your company’s key product shows a much larger defect rate for items made by the second shift than items made by the first. Your boss suggests a training program for the second shift.
Members of Shift 2 appear to be good candidates for additional training.
But could something else be going on? Your raw materials come from three different suppliers. What does the defect rate data look like if you include the supplier along with the shift?
Considering just the shift would hide the influence of a factor that could be even more important to defect rates: the supplier.
Now you can see that defect rates for both shifts are higher when using supplier two’s materials. Not accounting for this confounding factor almost led to an expensive “solution” that probably would do little to reduce the overall defect rate.
Even if you’ve been diligent about data collection methods and planning, if your team does not understand why and how you’re gathering data, you can still get bad information. Employees may focus on “making numbers” by embellishing results or other methods.
If data collection is complicated and demanding, it creates many more opportunities for problems. You can encourage good data collection by making it convenient, and by aligning data-related tasks with other responsibilities wherever possible. Providing adequate training about the data collection process will also reduce the potential for errors.
Even if you’ve been careful when gathering data, you can obtain questionable results if you don’t perform an exploratory data analysis. Check the descriptive statistics, including the mean and median values, and the standard deviation. An initial analysis usually also checks to see if data follow the normal distribution, a key assumption in many analyses, or if some other distribution is a better fit.
Graphing your data—in a boxplot, scatterplot, or individual value plot—will reveal outliers and oddities. Extreme values can have a big impact on results, so examine these carefully. If you collected your data in sequence, a time series plot may also show unexpected trends or an unusual series of data points.
Reviewing the raw data in a worksheet also helps. Sorting it by different fields can reveal data entry mistakes and inconsistencies, variations in coding, and other errors.
Failing to check your data before starting the “heavy lifting” with more sophisticated statistics can result in an analysis that requires much more time — or leads to unreliable conclusions!
The easier data is for team members to gather—and the more they
understand why it’s important—the better the data you’ll get.
Improving quality is not easy, and nobody sets out to waste time or sabotage their efforts by not collecting good data. But as these reminders show, it's all too easy to get problem data even when you're being careful!
Ford increased customer satisfaction thanks to a Gage R&R study run with Minitab
Read our online case study
Being able to analyze and act on Reliable Data has transformed the way Crayola does business.
Read our online case study
Our Minitab training courses include Measurement Systems Analysis.
View our course contents and training schedule online.
Get our free monthly e-newsletter for the latest Minitab news, tutorials, case studies, statistics tips and other helpful information.
Data is the new gold: 5 ways to make sure your data is reliable
Advancing the Power of Analytics
A Statistical Analysis of Boston’s 2015 Record Snowfall