This is Rod. Rod’s best friend is Chain. Rod always tries to help his friend. The other day, Rod set Jane up on a date with a guy who he promised was kind, handsome, and funny. He wasn’t. When Jane called about the next date, Rod admitted that the guy probably wasn’t so kind, and he probably wasn’t so handsome, and he was definitely not funny. Then Rod tried to help Jane plan a dinner for her parents. He recommended a restaurant that he said was excellent. Jane made reservations, and then two hours later, Rod called back and said he changed his mind; the restaurant’s gross. Jane had to change her reservations. Rod is an unreliable source of information. What he tells you in one moment might be changed five minutes later. You don’t know what to believe. In social research, measurement can be unreliable too. As we’ll see in a moment, Rod was unreliable in a specific kind of way, and there’s many different ways that a data source or source of information can be unreliable. When that is unreliable, we can’t trust the measures that we get. Our measures are all over the place, and we can’t figure out when we’re capturing true variation and when we’re just capturing noise or error that’s a result of a measurement process that can’t deliver consistent results that we can have confidence in. More specifically, measurement reliability is concerned with whether or not the questions we ask or the way that we collect data somehow distorts the measurements that we’re getting. When measurements are unreliable, we can’t trust the numbers, and if we can’t trust the numbers, we can’t trust the analysis.
I’ll talk about three types of reliability: stability, representative, and equivalence. Stability reliability is concerned with whether, if I ask a person the same question at different times, will I get the same answer or a different answer? In other words, are the answers we get dependent on a mood or the situation in which we ask someone? If it is, we might not be confident that the measurements that we’re taking, the answers to the questions we’re asking, are tapping into our subject’s true self. Let me give you an example of an unreliable question in terms of stability reliability. Let’s say I want to measure how depressed you are, your level of depression, and I ask a question: “How are you feeling today?” Usually, people will answer that question in very different ways on different days. When I ask the question, “How happy are you?”, I might not be tapping into their level of depression, which is what I’m trying to do. Instead, I might be capturing their mood, which changes quickly and isn’t of interest to my study. One way to improve this question is to ask people about their mood in general. For example, “Out of every seven days of the week, how many days do you feel bad?”
Representative reliability means if I ask people from different backgrounds or different communities the same question, will I get different answers? Not because these different people have true variation, but rather because they interpret the question differently. For example, it’s known that people with high incomes or high net-worth tend to understate how much they make when they’re asked by surveyors. So while a person earning fifty or sixty thousand dollars a year will more readily give that information and have it be accurate, a person who earns a lot of money might lowball how much they earned. The measurement works fine for people with middle-class incomes, but it works very badly for people with higher incomes. This is a type of question that’s plagued by representative reliability problems. Different populations or different parts of the population react to questions in different ways and adjust their answers.
The third type of reliability is equivalence reliability. Equivalence reliability is important when you’re using multiple questions to measure the same underlying concept. For example, in my depression questions, I might ask a bunch of questions that try to tap into whether somebody’s depressed. You might ask, “Do you cry often?”, “Do you often feel sad?”, “Do you find it hard to get out of bed?”, “Do you often have no appetite or overeat?”, “Do you have suicidal thoughts?”. If these questions have equivalent reliability, they’ll tend to be correlated. In other words, people who answer “yes” to one of these questions is more likely to answer “yes” to other questions of this sort.
How do we get around reliability issues? Well, there’s a few things you can do. Number one is to be very, very careful that the questions that you ask are not dependent on people’s mood or situational factors. Instead of asking how people feel at the moment, ask about their feelings in general. Second, collect a lot of measures. Collect a lot of information on respondent demographics that might help you figure out whether or not your questions are being understood and answered in the same way by different subsections of the population. And also, use multiple measures to capture the concepts that you’re trying to capture. That allows you to conduct tests to make sure that at least some of the ways that you’re trying to measure your concepts are in agreement, and that gives you a good indication that your measures are reliable. Maybe the best piece of advice, though, is to prepare in two ways. One, take a look at the literature, see how other people measure concepts, see what’s worked for other researchers. And number two, pretest your surveys or pretest your measurement schemes. What that means is go out and administer a small-scale survey and then have an in-depth discussion with the respondents and make sure that they understood the question properly and that they meant to answer in the way that they answered and that different types of respondents don’t come to each question with different understandings or different biases. Reliability is a problem. It’s something that you have to control if you want to trust your data. Don’t ignore it.