7.1 Overview of Nonexperimental Research
- Define nonexperimental research, distinguish it clearly from experimental research, and give several examples.
- Explain when a researcher might choose to conduct nonexperimental research as opposed to experimental research.
What Is Nonexperimental Research?
Nonexperimental researchResearch that lacks the manipulation of an independent variable or the random assignment of participants to conditions or orders of conditions. is research that lacks the manipulation of an independent variable, random assignment of participants to conditions or orders of conditions, or both.
In a sense, it is unfair to define this large and diverse set of approaches collectively by what they are not. But doing so reflects the fact that most researchers in psychology consider the distinction between experimental and nonexperimental research to be an extremely important one. This is because while experimental research can provide strong evidence that changes in an independent variable cause differences in a dependent variable, nonexperimental research generally cannot. As we will see, however, this does not mean that nonexperimental research is less important than experimental research or inferior to it in any general sense.
When to Use Nonexperimental Research
As we saw in Chapter 6 "Experimental Research", experimental research is appropriate when the researcher has a specific research question or hypothesis about a causal relationship between two variables—and it is possible, feasible, and ethical to manipulate the independent variable and randomly assign participants to conditions or to orders of conditions. It stands to reason, therefore, that nonexperimental research is appropriate—even necessary—when these conditions are not met. There are many ways in which this can be the case.
- The research question or hypothesis can be about a single variable rather than a statistical relationship between two variables (e.g., How accurate are people’s first impressions?).
- The research question can be about a noncausal statistical relationship between variables (e.g., Is there a correlation between verbal intelligence and mathematical intelligence?).
- The research question can be about a causal relationship, but the independent variable cannot be manipulated or participants cannot be randomly assigned to conditions or orders of conditions (e.g., Does damage to a person’s hippocampus impair the formation of long-term memory traces?).
- The research question can be broad and exploratory, or it can be about what it is like to have a particular experience (e.g., What is it like to be a working mother diagnosed with depression?).
Again, the choice between the experimental and nonexperimental approaches is generally dictated by the nature of the research question. If it is about a causal relationship and involves an independent variable that can be manipulated, the experimental approach is typically preferred. Otherwise, the nonexperimental approach is preferred. But the two approaches can also be used to address the same research question in complementary ways. For example, nonexperimental studies establishing that there is a relationship between watching violent television and aggressive behavior have been complemented by experimental studies confirming that the relationship is a causal one (Bushman & Huesmann, 2001). Similarly, after his original study, Milgram conducted experiments to explore the factors that affect obedience. He manipulated several independent variables, such as the distance between the experimenter and the participant, the participant and the confederate, and the location of the study (Milgram, 1974).
Types of Nonexperimental Research
Nonexperimental research falls into three broad categories: single-variable research, correlational and quasi-experimental research, and qualitative research. First, research can be nonexperimental because it focuses on a single variable rather than a statistical relationship between two variables. Although there is no widely shared term for this kind of research, we will call it single-variable researchResearch that focuses on a single variable rather than on a statistical relationship between variables.. Milgram’s original obedience study was nonexperimental in this way. He was primarily interested in one variable—the extent to which participants obeyed the researcher when he told them to shock the confederate—and he observed all participants performing the same task under the same conditions. The study by Loftus and Pickrell described at the beginning of this chapter is also a good example of single-variable research. The variable was whether participants “remembered” having experienced mildly traumatic childhood events (e.g., getting lost in a shopping mall) that they had not actually experienced but that the research asked them about repeatedly. In this particular study, nearly a third of the participants “remembered” at least one event. (As with Milgram’s original study, this study inspired several later experiments on the factors that affect false memories.)
As these examples make clear, single-variable research can answer interesting and important questions. What it cannot do, however, is answer questions about statistical relationships between variables. This is a point that beginning researchers sometimes miss. Imagine, for example, a group of research methods students interested in the relationship between children’s being the victim of bullying and the children’s self-esteem. The first thing that is likely to occur to these researchers is to obtain a sample of middle-school students who have been bullied and then to measure their self-esteem. But this would be a single-variable study with self-esteem as the only variable. Although it would tell the researchers something about the self-esteem of children who have been bullied, it would not tell them what they really want to know, which is how the self-esteem of children who have been bullied compares with the self-esteem of children who have not. Is it lower? Is it the same? Could it even be higher? To answer this question, their sample would also have to include middle-school students who have not been bullied.
Research can also be nonexperimental because it focuses on a statistical relationship between two variables but does not include the manipulation of an independent variable, random assignment of participants to conditions or orders of conditions, or both. This kind of research takes two basic forms: correlational research and quasi-experimental research. In correlational researchResearch in which two or more variables are measured and the statistical relationships among them are assessed. There is no manipulated independent variable and usually very little attempt to control extraneous variables., the researcher measures the two variables of interest with little or no attempt to control extraneous variables and then assesses the relationship between them. A research methods student who finds out whether each of several middle-school students has been bullied and then measures each student’s self-esteem is conducting correlational research. In quasi-experimental researchResearch that involves the manipulation of an independent variable but lacks the random assignment of participants to conditions or orders of conditions. It is generally used in field settings to test the effectiveness of a treatment., the researcher manipulates an independent variable but does not randomly assign participants to conditions or orders of conditions. For example, a researcher might start an antibullying program (a kind of treatment) at one school and compare the incidence of bullying at that school with the incidence at a similar school that has no antibullying program.
The final way in which research can be nonexperimental is that it can be qualitative. The types of research we have discussed so far are all quantitative, referring to the fact that the data consist of numbers that are analyzed using statistical techniques. In qualitative researchResearch that typically involves formulating broad research questions, collecting large amounts of data from a small number of participants, and summarizing the data using nonstatistical techniques., the data are usually nonnumerical and are analyzed using nonstatistical techniques. Rosenhan’s study of the experience of people in a psychiatric ward was primarily qualitative. The data were the notes taken by the “pseudopatients”—the people pretending to have heard voices—along with their hospital records. Rosenhan’s analysis consists mainly of a written description of the experiences of the pseudopatients, supported by several concrete examples. To illustrate the hospital staff’s tendency to “depersonalize” their patients, he noted, “Upon being admitted, I and other pseudopatients took the initial physical examinations in a semipublic room, where staff members went about their own business as if we were not there” (Rosenhan, 1973, p. 256).
Internal Validity Revisited
Recall that internal validity is the extent to which the design of a study supports the conclusion that changes in the independent variable caused any observed differences in the dependent variable. Figure 7.1 shows how experimental, quasi-experimental, and correlational research vary in terms of internal validity. Experimental research tends to be highest because it addresses the directionality and third-variable problems through manipulation and the control of extraneous variables through random assignment. If the average score on the dependent variable in an experiment differs across conditions, it is quite likely that the independent variable is responsible for that difference. Correlational research is lowest because it fails to address either problem. If the average score on the dependent variable differs across levels of the independent variable, it could be that the independent variable is responsible, but there are other interpretations. In some situations, the direction of causality could be reversed. In others, there could be a third variable that is causing differences in both the independent and dependent variables. Quasi-experimental research is in the middle because the manipulation of the independent variable addresses some problems, but the lack of random assignment and experimental control fails to address others. Imagine, for example, that a researcher finds two similar schools, starts an antibullying program in one, and then finds fewer bullying incidents in that “treatment school” than in the “control school.” There is no directionality problem because clearly the number of bullying incidents did not determine which school got the program. However, the lack of random assignment of children to schools could still mean that students in the treatment school differed from students in the control school in some other way that could explain the difference in bullying.
Experiments are generally high in internal validity, quasi-experiments lower, and correlational studies lower still.
Notice also in Figure 7.1 that there is some overlap in the internal validity of experiments, quasi-experiments, and correlational studies. For example, a poorly designed experiment that includes many confounding variables can be lower in internal validity than a well designed quasi-experiment with no obvious confounding variables.
- Nonexperimental research is research that lacks the manipulation of an independent variable, control of extraneous variables through random assignment, or both.
- There are three broad types of nonexperimental research. Single-variable research focuses on a single variable rather than a relationship between variables. Correlational and quasi-experimental research focus on a statistical relationship but lack manipulation or random assignment. Qualitative research focuses on broader research questions, typically involves collecting large amounts of data from a small number of participants, and analyzes the data nonstatistically.
- In general, experimental research is high in internal validity, correlational research is low in internal validity, and quasi-experimental research is in between.
Discussion: For each of the following studies, decide which type of research design it is and explain why.
- A researcher conducts detailed interviews with unmarried teenage fathers to learn about how they feel and what they think about their role as fathers and summarizes their feelings in a written narrative.
- A researcher measures the impulsivity of a large sample of drivers and looks at the statistical relationship between this variable and the number of traffic tickets the drivers have received.
- A researcher randomly assigns patients with low back pain either to a treatment involving hypnosis or to a treatment involving exercise. She then measures their level of low back pain after 3 months.
- A college instructor gives weekly quizzes to students in one section of his course but no weekly quizzes to students in another section to see whether this has an effect on their test performance.
7.2 Correlational Research
- Define correlational research and give several examples.
- Explain why a researcher might choose to conduct correlational research rather than experimental research or another type of nonexperimental research.
What Is Correlational Research?
Correlational research is a type of nonexperimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are essentially two reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment. The first is that they do not believe that the statistical relationship is a causal one. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms independent variable and dependent variable do not apply to this kind of research.
The other reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher cannot manipulate the independent variable because it is impossible, impractical, or unethical. For example, Allen Kanner and his colleagues thought that the number of “daily hassles” (e.g., rude salespeople, heavy traffic) that people experience affects the number of physical and psychological symptoms they have (Kanner, Coyne, Schaefer, & Lazarus, 1981). But because they could not manipulate the number of daily hassles their participants experienced, they had to settle for measuring the number of daily hassles—along with the number of symptoms—using self-report questionnaires. Although the strong positive relationship they found between these two variables is consistent with their idea that hassles cause symptoms, it is also consistent with the idea that symptoms cause hassles or that some third variable (e.g., neuroticism) causes both.
A common misconception among beginning researchers is that correlational research must involve two quantitative variables, such as scores on two extraversion tests or the number of hassles and number of symptoms people have experienced. However, the defining feature of correlational research is that the two variables are measured—neither one is manipulated—and this is true regardless of whether the variables are quantitative or categorical. Imagine, for example, that a researcher administers the Rosenberg Self-Esteem Scale to 50 American college students and 50 Japanese college students. Although this “feels” like a between-subjects experiment, it is a correlational study because the researcher did not manipulate the students’ nationalities. The same is true of the study by Cacioppo and Petty comparing college faculty and factory workers in terms of their need for cognition. It is a correlational study because the researchers did not manipulate the participants’ occupations.
Figure 7.2 "Results of a Hypothetical Study on Whether People Who Make Daily To-Do Lists Experience Less Stress Than People Who Do Not Make Such Lists" shows data from a hypothetical study on the relationship between whether people make a daily list of things to do (a “to-do list”) and stress. Notice that it is unclear whether this is an experiment or a correlational study because it is unclear whether the independent variable was manipulated. If the researcher randomly assigned some participants to make daily to-do lists and others not to, then it is an experiment. If the researcher simply asked participants whether they made daily to-do lists, then it is a correlational study. The distinction is important because if the study was an experiment, then it could be concluded that making the daily to-do lists reduced participants’ stress. But if it was a correlational study, it could only be concluded that these variables are statistically related. Perhaps being stressed has a negative effect on people’s ability to plan ahead (the directionality problem). Or perhaps people who are more conscientious are more likely to make to-do lists and less likely to be stressed (the third-variable problem). The crucial point is that what defines a study as experimental or correlational is not the variables being studied, nor whether the variables are quantitative or categorical, nor the type of graph or statistics used to analyze the data. It is how the study is conducted.
Figure 7.2 Results of a Hypothetical Study on Whether People Who Make Daily To-Do Lists Experience Less Stress Than People Who Do Not Make Such Lists
Data Collection in Correlational Research
Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated. However, because some approaches to data collection are strongly associated with correlational research, it makes sense to discuss them here. The two we will focus on are naturalistic observation and archival data. A third, survey research, is discussed in its own chapter.
Naturalistic observationAn approach to data collection in which the behavior of interest is observed in the environment in which it typically occurs. is an approach to data collection that involves observing people’s behavior in the environment in which it typically occurs. Thus naturalistic observation is a type of field research (as opposed to a type of laboratory research). It could involve observing shoppers in a grocery store, children on a school playground, or psychiatric inpatients in their wards. Researchers engaged in naturalistic observation usually make their observations as unobtrusively as possible so that participants are often not aware that they are being studied. Ethically, this is considered to be acceptable if the participants remain anonymous and the behavior occurs in a public setting where people would not normally have an expectation of privacy. Grocery shoppers putting items into their shopping carts, for example, are engaged in public behavior that is easily observable by store employees and other shoppers. For this reason, most researchers would consider it ethically acceptable to observe them for a study. On the other hand, one of the arguments against the ethicality of the naturalistic observation of “bathroom behavior” discussed earlier in the book is that people have a reasonable expectation of privacy even in a public restroom and that this expectation was violated.
Researchers Robert Levine and Ara Norenzayan used naturalistic observation to study differences in the “pace of life” across countries (Levine & Norenzayan, 1999). One of their measures involved observing pedestrians in a large city to see how long it took them to walk 60 feet. They found that people in some countries walked reliably faster than people in other countries. For example, people in the United States and Japan covered 60 feet in about 12 seconds on average, while people in Brazil and Romania took close to 17 seconds.
Because naturalistic observation takes place in the complex and even chaotic “real world,” there are two closely related issues that researchers must deal with before collecting data. The first is sampling. When, where, and under what conditions will the observations be made, and who exactly will be observed? Levine and Norenzayan described their sampling process as follows:
Male and female walking speed over a distance of 60 feet was measured in at least two locations in main downtown areas in each city. Measurements were taken during main business hours on clear summer days. All locations were flat, unobstructed, had broad sidewalks, and were sufficiently uncrowded to allow pedestrians to move at potentially maximum speeds. To control for the effects of socializing, only pedestrians walking alone were used. Children, individuals with obvious physical handicaps, and window-shoppers were not timed. Thirty-five men and 35 women were timed in most cities. (p. 186)
Precise specification of the sampling process in this way makes data collection manageable for the observers, and it also provides some control over important extraneous variables. For example, by making their observations on clear summer days in all countries, Levine and Norenzayan controlled for effects of the weather on people’s walking speeds.
The second issue is measurement. What specific behaviors will be observed? In Levine and Norenzayan’s study, measurement was relatively straightforward. They simply measured out a 60-foot distance along a city sidewalk and then used a stopwatch to time participants as they walked over that distance. Often, however, the behaviors of interest are not so obvious or objective. For example, researchers Robert Kraut and Robert Johnston wanted to study bowlers’ reactions to their shots, both when they were facing the pins and then when they turned toward their companions (Kraut & Johnston, 1979). But what “reactions” should they observe? Based on previous research and their own pilot testing, Kraut and Johnston created a list of reactions that included “closed smile,” “open smile,” “laugh,” “neutral face,” “look down,” “look away,” and “face cover” (covering one’s face with one’s hands). The observers committed this list to memory and then practiced by coding the reactions of bowlers who had been videotaped. During the actual study, the observers spoke into an audio recorder, describing the reactions they observed. Among the most interesting results of this study was that bowlers rarely smiled while they still faced the pins. They were much more likely to smile after they turned toward their companions, suggesting that smiling is not purely an expression of happiness but also a form of social communication.
Naturalistic observation has revealed that bowlers tend to smile when they turn away from the pins and toward their companions, suggesting that smiling is not purely an expression of happiness but also a form of social communication.
When the observations require a judgment on the part of the observers—as in Kraut and Johnston’s study—this process is often described as codingAn approach to measurement in naturalistic observation, in which target behaviors are specified ahead of time and observers watch for and record those specific behaviors.. Coding generally requires clearly defining a set of target behaviors. The observers then categorize participants individually in terms of which behavior they have engaged in and the number of times they engaged in each behavior. The observers might even record the duration of each behavior. The target behaviors must be defined in such a way that different observers code them in the same way. This is the issue of interrater reliability. Researchers are expected to demonstrate the interrater reliability of their coding procedure by having multiple raters code the same behaviors independently and then showing that the different observers are in close agreement. Kraut and Johnston, for example, video recorded a subset of their participants’ reactions and had two observers independently code them. The two observers showed that they agreed on the reactions that were exhibited 97% of the time, indicating good interrater reliability.
Another approach to correlational research is the use of archival dataExisting data that were collected or created for some other purpose. They can include school and hospital records, newspaper and magazine articles, Internet content, television shows, and many other things., which are data that have already been collected for some other purpose. An example is a study by Brett Pelham and his colleagues on “implicit egotism”—the tendency for people to prefer people, places, and things that are similar to themselves (Pelham, Carvallo, & Jones, 2005). In one study, they examined Social Security records to show that women with the names Virginia, Georgia, Louise, and Florence were especially likely to have moved to the states of Virginia, Georgia, Louisiana, and Florida, respectively.
As with naturalistic observation, measurement can be more or less straightforward when working with archival data. For example, counting the number of people named Virginia who live in various states based on Social Security records is relatively straightforward. But consider a study by Christopher Peterson and his colleagues on the relationship between optimism and health using data that had been collected many years before for a study on adult development (Peterson, Seligman, & Vaillant, 1988). In the 1940s, healthy male college students had completed an open-ended questionnaire about difficult wartime experiences. In the late 1980s, Peterson and his colleagues reviewed the men’s questionnaire responses to obtain a measure of explanatory style—their habitual ways of explaining bad events that happen to them. More pessimistic people tend to blame themselves and expect long-term negative consequences that affect many aspects of their lives, while more optimistic people tend to blame outside forces and expect limited negative consequences. To obtain a measure of explanatory style for each participant, the researchers used a procedure in which all negative events mentioned in the questionnaire responses, and any causal explanations for them, were identified and written on index cards. These were given to a separate group of raters who rated each explanation in terms of three separate dimensions of optimism-pessimism. These ratings were then averaged to produce an explanatory style score for each participant. The researchers then assessed the statistical relationship between the men’s explanatory style as college students and archival measures of their health at approximately 60 years of age. The primary result was that the more optimistic the men were as college students, the healthier they were as older men. Pearson’s r was +.25.
This is an example of content analysisA family of techniques for analyzing archival data that generally involves identifying specific words, phrases, ideas, or other content in the data and then counting or summarizing their occurrence in other quantitative ways.—a family of systematic approaches to measurement using complex archival data. Just as naturalistic observation requires specifying the behaviors of interest and then noting them as they occur, content analysis requires specifying keywords, phrases, or ideas and then finding all occurrences of them in the data. These occurrences can then be counted, timed (e.g., the amount of time devoted to entertainment topics on the nightly news show), or analyzed in a variety of other ways.
- Correlational research involves measuring two variables and assessing the relationship between them, with no manipulation of an independent variable.
- Correlational research is not defined by where or how the data are collected. However, some approaches to data collection are strongly associated with correlational research. These include naturalistic observation (in which researchers observe people’s behavior in the context in which it normally occurs) and the use of archival data that were already collected for some other purpose.
Discussion: For each of the following, decide whether it is most likely that the study described is experimental or correlational and explain why.
- An educational researcher compares the academic performance of students from the “rich” side of town with that of students from the “poor” side of town.
- A cognitive psychologist compares the ability of people to recall words that they were instructed to “read” with their ability to recall words that they were instructed to “imagine.”
- A manager studies the correlation between new employees’ college grade point averages and their first-year performance reports.
- An automotive engineer installs different stick shifts in a new car prototype, each time asking several people to rate how comfortable the stick shift feels.
- A food scientist studies the relationship between the temperature inside people’s refrigerators and the amount of bacteria on their food.
- A social psychologist tells some research participants that they need to hurry over to the next building to complete a study. She tells others that they can take their time. Then she observes whether they stop to help a research assistant who is pretending to be hurt.
7.3 Quasi-Experimental Research
- Explain what quasi-experimental research is and distinguish it clearly from both experimental and correlational research.
- Describe three different types of quasi-experimental research designs (nonequivalent groups, pretest-posttest, and interrupted time series) and identify examples of each one.
The prefix quasi means “resembling.” Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook & Campbell, 1979). Because the independent variable is manipulated before the dependent variable is measured, quasi-experimental research eliminates the directionality problem. But because participants are not randomly assigned—making it likely that there are other differences between conditions—quasi-experimental research does not eliminate the problem of confounding variables. In terms of internal validity, therefore, quasi-experiments are generally somewhere between correlational studies and true experiments.
Quasi-experiments are most likely to be conducted in field settings in which random assignment is difficult or impossible. They are often conducted to evaluate the effectiveness of a treatment—perhaps a type of psychotherapy or an educational intervention. There are many different kinds of quasi-experiments, but we will discuss just a few of the most common ones here.
Nonequivalent Groups Design
Recall that when participants in a between-subjects experiment are randomly assigned to conditions, the resulting groups are likely to be quite similar. In fact, researchers consider them to be equivalent. When participants are not randomly assigned to conditions, however, the resulting groups are likely to be dissimilar in some ways. For this reason, researchers consider them to be nonequivalent. A nonequivalent groups designA between-subjects research design in which participants are not randomly assigned to conditions, usually because participants are in preexisting groups (e.g., students at different schools)., then, is a between-subjects design in which participants have not been randomly assigned to conditions.
Imagine, for example, a researcher who wants to evaluate a new method of teaching fractions to third graders. One way would be to conduct a study with a treatment group consisting of one class of third-grade students and a control group consisting of another class of third-grade students. This would be a nonequivalent groups design because the students are not randomly assigned to classes by the researcher, which means there could be important differences between them. For example, the parents of higher achieving or more motivated students might have been more likely to request that their children be assigned to Ms. Williams’s class. Or the principal might have assigned the “troublemakers” to Mr. Jones’s class because he is a stronger disciplinarian. Of course, the teachers’ styles, and even the classroom environments, might be very different and might cause different levels of achievement or motivation among the students. If at the end of the study there was a difference in the two classes’ knowledge of fractions, it might have been caused by the difference between the teaching methods—but it might have been caused by any of these confounding variables.
Of course, researchers using a nonequivalent groups design can take steps to ensure that their groups are as similar as possible. In the present example, the researcher could try to select two classes at the same school, where the students in the two classes have similar scores on a standardized math test and the teachers are the same sex, are close in age, and have similar teaching styles. Taking such steps would increase the internal validity of the study because it would eliminate some of the most important confounding variables. But without true random assignment of the students to conditions, there remains the possibility of other important confounding variables that the researcher was not able to control.
In a pretest-posttest designA research design in which the dependent variable is measured (the pretest), a treatment is given, and the dependent variable is measured again (the posttest) to see if there is a change in the dependent variable from pretest to posttest., the dependent variable is measured once before the treatment is implemented and once after it is implemented. Imagine, for example, a researcher who is interested in the effectiveness of an antidrug education program on elementary school students’ attitudes toward illegal drugs. The researcher could measure the attitudes of students at a particular elementary school during one week, implement the antidrug program during the next week, and finally, measure their attitudes again the following week. The pretest-posttest design is much like a within-subjects experiment in which each participant is tested first under the control condition and then under the treatment condition. It is unlike a within-subjects experiment, however, in that the order of conditions is not counterbalanced because it typically is not possible for a participant to be tested in the treatment condition first and then in an “untreated” control condition.
If the average posttest score is better than the average pretest score, then it makes sense to conclude that the treatment might be responsible for the improvement. Unfortunately, one often cannot conclude this with a high degree of certainty because there may be other explanations for why the posttest scores are better. One category of alternative explanations goes under the name of historyRefers collectively to extraneous events that can occur between a pretest and posttest or between the first and last measurements in a time series. It can provide alternative explanations for an observed change in the dependent variable.. Other things might have happened between the pretest and the posttest. Perhaps an antidrug program aired on television and many of the students watched it, or perhaps a celebrity died of a drug overdose and many of the students heard about it. Another category of alternative explanations goes under the name of maturationRefers collectively to extraneous developmental changes in participants that can occur between a pretest and posttest or between the first and last measurements in a time series. It can provide an alternative explanation for an observed change in the dependent variable.. Participants might have changed between the pretest and the posttest in ways that they were going to anyway because they are growing and learning. If it were a yearlong program, participants might become less impulsive or better reasoners and this might be responsible for the change.
Another alternative explanation for a change in the dependent variable in a pretest-posttest design is regression to the meanThe statistical fact that an individual who scores extremely on one occasion will tend to score less extremely on the next occasion.. This refers to the statistical fact that an individual who scores extremely on a variable on one occasion will tend to score less extremely on the next occasion. For example, a bowler with a long-term average of 150 who suddenly bowls a 220 will almost certainly score lower in the next game. Her score will “regress” toward her mean score of 150. Regression to the mean can be a problem when participants are selected for further study because of their extreme scores. Imagine, for example, that only students who scored especially low on a test of fractions are given a special training program and then retested. Regression to the mean all but guarantees that their scores will be higher even if the training program has no effect. A closely related concept—and an extremely important one in psychological research—is spontaneous remissionImprovement in a psychological or medical problem over time without any treatment.. This is the tendency for many medical and psychological problems to improve over time without any form of treatment. The common cold is a good example. If one were to measure symptom severity in 100 common cold sufferers today, give them a bowl of chicken soup every day, and then measure their symptom severity again in a week, they would probably be much improved. This does not mean that the chicken soup was responsible for the improvement, however, because they would have been much improved without any treatment at all. The same is true of many psychological problems. A group of severely depressed people today is likely to be less depressed on average in 6 months. In reviewing the results of several studies of treatments for depression, researchers Michael Posternak and Ivan Miller found that participants in waitlist control conditions improved an average of 10 to 15% before they received any treatment at all (Posternak & Miller, 2001). Thus one must generally be very cautious about inferring causality from pretest-posttest designs.
Does Psychotherapy Work?
Early studies on the effectiveness of psychotherapy tended to use pretest-posttest designs. In a classic 1952 article, researcher Hans Eysenck summarized the results of 24 such studies showing that about two thirds of patients improved between the pretest and the posttest (Eysenck, 1952). But Eysenck also compared these results with archival data from state hospital and insurance company records showing that similar patients recovered at about the same rate without receiving psychotherapy. This suggested to Eysenck that the improvement that patients showed in the pretest-posttest studies might be no more than spontaneous remission. Note that Eysenck did not conclude that psychotherapy was ineffective. He merely concluded that there was no evidence that it was, and he wrote of “the necessity of properly planned and executed experimental studies into this important field” (p. 323). You can read the entire article here:
Fortunately, many other researchers took up Eysenck’s challenge, and by 1980 hundreds of experiments had been conducted in which participants were randomly assigned to treatment and control conditions, and the results were summarized in a classic book by Mary Lee Smith, Gene Glass, and Thomas Miller (Smith, Glass, & Miller, 1980). They found that overall psychotherapy was quite effective, with about 80% of treatment participants improving more than the average control participant. Subsequent research has focused more on the conditions under which different types of psychotherapy are more or less effective.
In a classic 1952 article, researcher Hans Eysenck pointed out the shortcomings of the simple pretest-posttest design for evaluating the effectiveness of psychotherapy.
Interrupted Time Series Design
A variant of the pretest-posttest design is the interrupted time-series designA research design in which a series of measurements of the dependent variable are taken both before and after a treatment.. A time series is a set of measurements taken at intervals over a period of time. For example, a manufacturing company might measure its workers’ productivity each week for a year. In an interrupted time series-design, a time series like this is “interrupted” by a treatment. In one classic example, the treatment was the reduction of the work shifts in a factory from 10 hours to 8 hours (Cook & Campbell, 1979). Because productivity increased rather quickly after the shortening of the work shifts, and because it remained elevated for many months afterward, the researcher concluded that the shortening of the shifts caused the increase in productivity. Notice that the interrupted time-series design is like a pretest-posttest design in that it includes measurements of the dependent variable both before and after the treatment. It is unlike the pretest-posttest design, however, in that it includes multiple pretest and posttest measurements.
Figure 7.5 "A Hypothetical Interrupted Time-Series Design" shows data from a hypothetical interrupted time-series study. The dependent variable is the number of student absences per week in a research methods course. The treatment is that the instructor begins publicly taking attendance each day so that students know that the instructor is aware of who is present and who is absent. The top panel of Figure 7.5 "A Hypothetical Interrupted Time-Series Design" shows how the data might look if this treatment worked. There is a consistently high number of absences before the treatment, and there is an immediate and sustained drop in absences after the treatment. The bottom panel of Figure 7.5 "A Hypothetical Interrupted Time-Series Design" shows how the data might look if this treatment did not work. On average, the number of absences after the treatment is about the same as the number before. This figure also illustrates an advantage of the interrupted time-series design over a simpler pretest-posttest design. If there had been only one measurement of absences before the treatment at Week 7 and one afterward at Week 8, then it would have looked as though the treatment were responsible for the reduction. The multiple measurements both before and after the treatment suggest that the reduction between Weeks 7 and 8 is nothing more than normal week-to-week variation.
Figure 7.5 A Hypothetical Interrupted Time-Series Design
The top panel shows data that suggest that the treatment caused a reduction in absences. The bottom panel shows data that suggest that it did not.
A type of quasi-experimental design that is generally better than either the nonequivalent groups design or the pretest-posttest design is one that combines elements of both. There is a treatment group that is given a pretest, receives a treatment, and then is given a posttest. But at the same time there is a control group that is given a pretest, does not receive the treatment, and then is given a posttest. The question, then, is not simply whether participants who receive the treatment improve but whether they improve more than participants who do not receive the treatment.
Imagine, for example, that students in one school are given a pretest on their attitudes toward drugs, then are exposed to an antidrug program, and finally are given a posttest. Students in a similar school are given the pretest, not exposed to an antidrug program, and finally are given a posttest. Again, if students in the treatment condition become more negative toward drugs, this could be an effect of the treatment, but it could also be a matter of history or maturation. If it really is an effect of the treatment, then students in the treatment condition should become more negative than students in the control condition. But if it is a matter of history (e.g., news of a celebrity drug overdose) or maturation (e.g., improved reasoning), then students in the two conditions would be likely to show similar amounts of change. This type of design does not completely eliminate the possibility of confounding variables, however. Something could occur at one of the schools but not the other (e.g., a student drug overdose), so students at the first school would be affected by it while students at the other school would not.
Finally, if participants in this kind of design are randomly assigned to conditions, it becomes a true experiment rather than a quasi experiment. In fact, it is the kind of experiment that Eysenck called for—and that has now been conducted many times—to demonstrate the effectiveness of psychotherapy.
- Quasi-experimental research involves the manipulation of an independent variable without the random assignment of participants to conditions or orders of conditions. Among the important types are nonequivalent groups designs, pretest-posttest, and interrupted time-series designs.
- Quasi-experimental research eliminates the directionality problem because it involves the manipulation of the independent variable. It does not eliminate the problem of confounding variables, however, because it does not involve random assignment to conditions. For these reasons, quasi-experimental research is generally higher in internal validity than correlational studies but lower than true experiments.
- Practice: Imagine that two college professors decide to test the effect of giving daily quizzes on student performance in a statistics course. They decide that Professor A will give quizzes but Professor B will not. They will then compare the performance of students in their two sections on a common final exam. List five other variables that might differ between the two sections that could affect the results.
Discussion: Imagine that a group of obese children is recruited for a study in which their weight is measured, then they participate for 3 months in a program that encourages them to be more active, and finally their weight is measured again. Explain how each of the following might affect the results:
- regression to the mean
- spontaneous remission
7.4 Qualitative Research
- List several ways in which qualitative research differs from quantitative research in psychology.
- Describe the strengths and weaknesses of qualitative research in psychology compared with quantitative research.
- Give examples of qualitative research in psychology.
What Is Qualitative Research?
This book is primarily about quantitative researchResearch that involves formulating focused research questions, collecting small amounts of data from a large number of participants, and summarizing the data using descriptive and inferential statistics.. Quantitative researchers typically start with a focused research question or hypothesis, collect a small amount of data from each of a large number of individuals, describe the resulting data using statistical techniques, and draw general conclusions about some large population. Although this is by far the most common approach to conducting empirical research in psychology, there is an important alternative called qualitative research. Qualitative research originated in the disciplines of anthropology and sociology but is now used to study many psychological topics as well. Qualitative researchers generally begin with a less focused research question, collect large amounts of relatively “unfiltered” data from a relatively small number of individuals, and describe their data using nonstatistical techniques. They are usually less concerned with drawing general conclusions about human behavior than with understanding in detail the experience of their research participants.
Consider, for example, a study by researcher Per Lindqvist and his colleagues, who wanted to learn how the families of teenage suicide victims cope with their loss (Lindqvist, Johansson, & Karlsson, 2008). They did not have a specific research question or hypothesis, such as, What percentage of family members join suicide support groups? Instead, they wanted to understand the variety of reactions that families had, with a focus on what it is like from their perspectives. To do this, they interviewed the families of 10 teenage suicide victims in their homes in rural Sweden. The interviews were relatively unstructured, beginning with a general request for the families to talk about the victim and ending with an invitation to talk about anything else that they wanted to tell the interviewer. One of the most important themes that emerged from these interviews was that even as life returned to “normal,” the families continued to struggle with the question of why their loved one committed suicide. This struggle appeared to be especially difficult for families in which the suicide was most unexpected.
The Purpose of Qualitative Research
Again, this book is primarily about quantitative research in psychology. The strength of quantitative research is its ability to provide precise answers to specific research questions and to draw general conclusions about human behavior. This is how we know that people have a strong tendency to obey authority figures, for example, or that female college students are not substantially more talkative than male college students. But while quantitative research is good at providing precise answers to specific research questions, it is not nearly as good at generating novel and interesting research questions. Likewise, while quantitative research is good at drawing general conclusions about human behavior, it is not nearly as good at providing detailed descriptions of the behavior of particular groups in particular situations. And it is not very good at all at communicating what it is actually like to be a member of a particular group in a particular situation.
But the relative weaknesses of quantitative research are the relative strengths of qualitative research. Qualitative research can help researchers to generate new and interesting research questions and hypotheses. The research of Lindqvist and colleagues, for example, suggests that there may be a general relationship between how unexpected a suicide is and how consumed the family is with trying to understand why the teen committed suicide. This relationship can now be explored using quantitative research. But it is unclear whether this question would have arisen at all without the researchers sitting down with the families and listening to what they themselves wanted to say about their experience. Qualitative research can also provide rich and detailed descriptions of human behavior in the real-world contexts in which it occurs. Among qualitative researchers, this is often referred to as “thick description” (Geertz, 1973). Similarly, qualitative research can convey a sense of what it is actually like to be a member of a particular group or in a particular situation—what qualitative researchers often refer to as the “lived experience” of the research participants. Lindqvist and colleagues, for example, describe how all the families spontaneously offered to show the interviewer the victim’s bedroom or the place where the suicide occurred—revealing the importance of these physical locations to the families. It seems unlikely that a quantitative study would have discovered this.
Data Collection and Analysis in Qualitative Research
As with correlational research, data collection approaches in qualitative research are quite varied and can involve naturalistic observation, archival data, artwork, and many other things. But one of the most common approaches, especially for psychological research, is to conduct interviewsA data collection method in qualitative research. Interviews can be structured, semistructured, or unstructured—depending on how well specified the sequence of questions or prompts is.. Interviews in qualitative research tend to be unstructured—consisting of a small number of general questions or prompts that allow participants to talk about what is of interest to them. The researcher can follow up by asking more detailed questions about the topics that do come up. Such interviews can be lengthy and detailed, but they are usually conducted with a relatively small sample. This was essentially the approach used by Lindqvist and colleagues in their research on the families of suicide survivors. Small groups of people who participate together in interviews focused on a particular topic or issue are often referred to as focus groupsA small group of people who participate together in an interview focused on a particular topic or issue.. The interaction among participants in a focus group can sometimes bring out more information than can be learned in a one-on-one interview. The use of focus groups has become a standard technique in business and industry among those who want to understand consumer tastes and preferences. The content of all focus group interviews is usually recorded and transcribed to facilitate later analyses.
Another approach to data collection in qualitative research is participant observation. In participant observationAn approach to data collection in qualitative research in which the researcher becomes an active participant in the group or situation under study., researchers become active participants in the group or situation they are studying. The data they collect can include interviews (usually unstructured), their own notes based on their observations and interactions, documents, photographs, and other artifacts. The basic rationale for participant observation is that there may be important information that is only accessible to, or can be interpreted only by, someone who is an active participant in the group or situation. An example of participant observation comes from a study by sociologist Amy Wilkins (published in Social Psychology Quarterly) on a college-based religious organization that emphasized how happy its members were (Wilkins, 2008). Wilkins spent 12 months attending and participating in the group’s meetings and social events, and she interviewed several group members. In her study, Wilkins identified several ways in which the group “enforced” happiness—for example, by continually talking about happiness, discouraging the expression of negative emotions, and using happiness as a way to distinguish themselves from other groups.
Data Analysis in Quantitative Research
Although quantitative and qualitative research generally differ along several important dimensions (e.g., the specificity of the research question, the type of data collected), it is the method of data analysis that distinguishes them more clearly than anything else. To illustrate this idea, imagine a team of researchers that conducts a series of unstructured interviews with recovering alcoholics to learn about the role of their religious faith in their recovery. Although this sounds like qualitative research, imagine further that once they collect the data, they code the data in terms of how often each participant mentions God (or a “higher power”), and they then use descriptive and inferential statistics to find out whether those who mention God more often are more successful in abstaining from alcohol. Now it sounds like quantitative research. In other words, the quantitative-qualitative distinction depends more on what researchers do with the data they have collected than with why or how they collected the data.
But what does qualitative data analysis look like? Just as there are many ways to collect data in qualitative research, there are many ways to analyze data. Here we focus on one general approach called grounded theoryAn approach to analyzing qualitative data in which repeating ideas are identified and grouped into broader themes. The themes are integrated in a theoretical narrative. (Glaser & Strauss, 1967). This approach was developed within the field of sociology in the 1960s and has gradually gained popularity in psychology. Remember that in quantitative research, it is typical for the researcher to start with a theory, derive a hypothesis from that theory, and then collect data to test that specific hypothesis. In qualitative research using grounded theory, researchers start with the data and develop a theory or an interpretation that is “grounded in” those data. They do this in stages. First, they identify ideas that are repeated throughout the data. Then they organize these ideas into a smaller number of broader themes. Finally, they write a theoretical narrativeIn grounded theory, a narrative interpretation of the broad themes that emerge from the data, usually supported by many direct quotations or examples from the data.—an interpretation—of the data in terms of the themes that they have identified. This theoretical narrative focuses on the subjective experience of the participants and is usually supported by many direct quotations from the participants themselves.
As an example, consider a study by researchers Laura Abrams and Laura Curran, who used the grounded theory approach to study the experience of postpartum depression symptoms among low-income mothers (Abrams & Curran, 2009). Their data were the result of unstructured interviews with 19 participants. Table 7.1 "Themes and Repeating Ideas in a Study of Postpartum Depression Among Low-Income Mothers" shows the five broad themes the researchers identified and the more specific repeating ideas that made up each of those themes. In their research report, they provide numerous quotations from their participants, such as this one from “Destiny:”
Well, just recently my apartment was broken into and the fact that his Medicaid for some reason was cancelled so a lot of things was happening within the last two weeks all at one time. So that in itself I don’t want to say almost drove me mad but it put me in a funk.…Like I really was depressed. (p. 357)
Their theoretical narrative focused on the participants’ experience of their symptoms not as an abstract “affective disorder” but as closely tied to the daily struggle of raising children alone under often difficult circumstances.
Table 7.1 Themes and Repeating Ideas in a Study of Postpartum Depression Among Low-Income Mothers
|Ambivalence||“I wasn’t prepared for this baby,” “I didn’t want to have any more children.”|
|Caregiving overload||“Please stop crying,” “I need a break,” “I can’t do this anymore.”|
|Juggling||“No time to breathe,” “Everyone depends on me,” “Navigating the maze.”|
|Mothering alone||“I really don’t have any help,” “My baby has no father.”|
|Real-life worry||“I don’t have any money,” “Will my baby be OK?” “It’s not safe here.”|
The Quantitative-Qualitative “Debate”
Given their differences, it may come as no surprise that quantitative and qualitative research in psychology and related fields do not coexist in complete harmony. Some quantitative researchers criticize qualitative methods on the grounds that they lack objectivity, are difficult to evaluate in terms of reliability and validity, and do not allow generalization to people or situations other than those actually studied. At the same time, some qualitative researchers criticize quantitative methods on the grounds that they overlook the richness of human behavior and experience and instead answer simple questions about easily quantifiable variables.
In general, however, qualitative researchers are well aware of the issues of objectivity, reliability, validity, and generalizability. In fact, they have developed a number of frameworks for addressing these issues (which are beyond the scope of our discussion). And in general, quantitative researchers are well aware of the issue of oversimplification. They do not believe that all human behavior and experience can be adequately described in terms of a small number of variables and the statistical relationships among them. Instead, they use simplification as a strategy for uncovering general principles of human behavior.
Many researchers from both the quantitative and qualitative camps now agree that the two approaches can and should be combined into what has come to be called mixed-methods researchResearch that uses both quantitative and qualitative methods. (Todd, Nerlich, McKeown, & Clarke, 2004). (In fact, the studies by Lindqvist and colleagues and by Abrams and Curran both combined quantitative and qualitative approaches.) One approach to combining quantitative and qualitative research is to use qualitative research for hypothesis generation and quantitative research for hypothesis testing. Again, while a qualitative study might suggest that families who experience an unexpected suicide have more difficulty resolving the question of why, a well-designed quantitative study could test a hypothesis by measuring these specific variables for a large sample. A second approach to combining quantitative and qualitative research is referred to as triangulationIn mixed methods research, using multiple quantitative and qualitative methods to study the same topic, with the goal of converging on a single interpretation.. The idea is to use both quantitative and qualitative methods simultaneously to study the same general questions and to compare the results. If the results of the quantitative and qualitative methods converge on the same general conclusion, they reinforce and enrich each other. If the results diverge, then they suggest an interesting new question: Why do the results diverge and how can they be reconciled?
- Qualitative research is an important alternative to quantitative research in psychology. It generally involves asking broader research questions, collecting more detailed data (e.g., interviews), and using nonstatistical analyses.
- Many researchers conceptualize quantitative and qualitative research as complementary and advocate combining them. For example, qualitative research can be used to generate hypotheses and quantitative research to test them.
- Discussion: What are some ways in which a qualitative study of girls who play youth baseball would be likely to differ from a quantitative study on the same topic?
Experiment and Non-experiment
Experimental research and non-experimental research
"Experiment" is a widely misused term. When some people talk about their "experiment," indeed their study is non-experimental in nature. The following are the characteristics of experimental and non-experimental research designs.
It is very common for even experienced researchers to be confused by random sampling and randomization. For example, Morse (2007) wrote,
- Random sampling: a sampling method in which each member of a set has independent chances to be selected (the notion of "equal chances" is a theoretical ideal mentioned by many textbooks, but there are always some hidden bias or disposition in the real world).
- Randomization: randomly assign subjects into the control group and the treatment group.
- Experimenter manipulation: directly manipulate variables to test cause-and-effect relationships e.g. alter the amount of drug given to the patients. The researcher manuipulates the factor that she cares about.
- Experimenter control: involves control of all other extraneous variables or conditions that might have an impact on the dependent variables. The researcher removes the effect that she doesn't care.What is wrong with randomization? Processes of saturation are essential in qualitative inquiry: saturation ensures replication and validation of data; and it ensures that our data are valid and reliable. If we select a sample randomly, the factors that we are interested in for our study would be normally distributed in our data, and be represented by some sort of a curve, normal or skewed. Regardless of the type of curve, we would have lots of data about common events, and inadequate data about less common events. Given that a qualitative data set requires a more rectangular distribution to achieve saturation, with randomization we would have too much data around the mean (and be swamped with the excess), and not enough data to saturate on categories in the tails of the distribution (p.234) (Emphasis added by the author).
As of August 7, 2017, the website of the Department of Statistics explained the role of sampling in statistical inference as follows:The use of randomization in sampling allows for the analysis of results using the methods of statistical inference. Statistical inference is based on the laws of probability, and allows analysts to infer conclusions about a given population based on results observed through random sampling (para. 1) (Emphasis added by the author).Again, randomization is concerned with assignment of group membership after the sample is drawn, whereas random sampling is a subject selection process.
Control and manipulation are very crucial to experimentation. Without them, the conclusion drawn from an observed phenomenon could be completely wrong even if it makes sense. Let's look at an everyday example: One of my friends has two TV sets. One of them is Japanese-made while the other is European-made. She insisted that the Japanese TV has a better quality than the European one because the former presents a sharper picture. Being skeptical to her claim, I conducted a small experiment: I simply swapped the locations of the two TV sets. As a result, the European TV set showed a clearer picture than the Japanese one. As you see, the factor here is the signal rather than the electronics. In an experiment, if I put all TVs under study in the same location, then location as a source of "noise" is under my control. If I alternating the location for each TV, then location becomes a variable under my manipulation.
Let's use herbs as another example: A Chinese friend maintained that some Chinese herbs could heal certain diseases. She even conducted an experiment to prove it. When her husband suffered a long-term illness, he took Chinese herbs for one week and his health condition improved substantively. The next week he stopped taking Chinese herbs and the condition reversed. I asked her how many types of Chinese herbs her husband took, she answered, "Ten." If I feed a patient with 10 vitamins, I am sure he will get better, too! Because of the lack of manipulation/partition of the chemical components of the herb, this "experiment" did not tell us which Chinese herb is helpful to which body function.
However, it is important to note that "control" is not the core essence of experimentation. The difference between controlled experiment and randomized experiment will be discussed in a later section.
Quasi-experimentA quasi-experiment is a research design that does not meet all the requirements necessary for controlling the influence of extraneous variables. Usually what is missing is random assignment.
For example, when a researcher studies gender difference in computer use, obviously he cannot randomly assign gender (I am happy as a man. I don't want to be re-assigned).
It is generally agreed that the primary demarcation criterion of experiments and quasi-experiments is random assignment of group membership. Nonetheless, some authors consider random selection as a criterion, too. For example, according to Plichta and Garzon (2009), "quasi-experimental designs may lack random selection, random assignments, or both" (p.13). In a similar vein, Moule and Hek (2012) suggested that convenience sampling is "a part of survey or quasi-experiment designs" (p.95).
Survey researchThis type of research is very common in political sciences and communications, in which many variables are not controllable. For example, if you intend to study how wars affect people's perception to the quality of policy making, you cannot create a war or manipulate other world affairs, unless you are the villain in the movie "Tomorrow never dies." Because of this limitation, researchers send surveys to participants who are exposed to the real conditions.
Secondary analysis: Archival researchArchival research is a subset of secondary data analysis, but the two terms are not synonymous. Meta-analysis, in which results of prior research are synthesized, is also a form of secondary analysis. As the name implies, archival research utilizes existing raw data archived in databases, but meta-analysis extracts statistical results from previous studies. If you don't like the tedious IRB process, go for secondary data analysis.
Archival research is popular in economics and educational research, especially when the research project involves trends or longitudinal data. For example, if the researcher wants to find out the correlation between productivity and school performance, he can contact the General Accounting Office and the Department of Education for obtaining the related data in the last twenty years. The following are some examples of archival data that are openly accessible:
Obviously, there are advantages of archival data analysis:
On the the hand, there are shortcomings and limitations. For example, you might be interested in analyzing disposable income, but the variable is gross income. In other words, your research question is confined by what you have at hand (Management Study Guide, 2016).
- It saves time, efforts, and money, because the data are online available (Most online databases are free, but CCMH requires a full data access fee).
- It provides a basis for comparing the results of secondary data analysis and your primary data analysis (e.g. national sample vs. local sample).
- The sample size is much bigger than what you can collect by yourself. A small-sample study lacks statistical power and the result might not be stable across different settings. On the contrary, big data can reveal stable patterns.
- Many social science studies are conducted with samples that are disproportionally drawn from Western, educated, industrialized, rich, and democratic populations (WEIRD; Henrich, Heine, & Norenzayan, 2010). Nationwide and international data sets alleviate the problem of WEIRD.
Additionally, it is important to point out that very often there are discrepancies between different sources of archival data, and thus researcher should exercise caution in drawing firm conclusions derived from a single data source. For example, GDP per capita is commonly used in many archival research studies. Nonetheless, there exist vast differences between the two different sources indicating GDP per capita of each country, namely, World Development Indicators (WDI) and Penn World Table 7.1 (PWT) (Ram & Ural, 2014). In addition, based on the 2005 UN Human Development statistics, Harris (n.d.) pointed out that the most atheistic societies, including many secular European nations, are the healthiest. However, in Happy Planet Index none of those secular European countries is ranked among the top 20. The table below shows the recent figures of UNHD and HPI side by side.
Happy Planet Index
United Arab Emirates
CommentsBoth natural settings and laboratory-controlled experiments have pros and cons. On some occasions, things happen in the real life challenge artificial experiments. For example, in some lab-controlled benchmark tests, Windows outperforms Mac OS, Linux, and even UNIX! But computer users tell different stories in real settings.
It is common that experimentation is equated with scientific methodology, and thus is highly regarded. Actually, certain science subjects do not heavily reply on experimentation, such as Astronomy (Big bang, Quantum tunneling) and physics (e.g. M-theory). In classical astronomy the major source of knowledge is from observation rather than experimentation (Deese, 1972). For example, you cannot blow up Mars and see how the absence of Mars affects the gravitational force of the Solar system (With modern rocket and nuclear technologies, humans may be able to do so, but we shouldn't)! And the study of the origin of the universe could not count on even observation. Mathematics is another example. Although today with the aid of high-power computer, several mathematicians are able to conduct "mathematical experiments" by simulation (Chaitin, 1998), basically the origin of mathematical theorems are from logical deduction. Lack of experimentation can also be found in certain areas of biology such as evolution. Barkow (1989) pointed out that an evolutionary scenario is speculative in which the usual requirements for empirical verifiability are relaxed in favor of an emphasis on logic and plausibility.
Randomization and Simpson's Paradox
Randomization is the major difference between experiment and quasi-experiment. It is important to point out some common misconceptions regarding randomization.
Random sampling and randomizationAs mentioned before, many people confuse random sampling and randomization. The former is a sampling process while the latter is concerned with assignment of group membership. Further, The purpose of random sampling is to enhance the generalizability of the results while the purpose of randomization is to establish the cause-effect interpretations of the results. In other words, random sampling counteracts the threat to external validity whereas randomization addresses the threat of internal validity. However, the above concepts are easily confused (May & Hunter, 1988). The topic of internal validity and external validity will be discussed in another write-up entitled Threats to validity of Research Design.
In practice, randomization plays a more important role than random sampling in research. Let's face it. How often can a researcher draw a random sample? If the target population consists of all university students, are you able to draw samples from campuses in states other than your own? As a matter of fact, most research studies recruit convenience subjects that are instantly available (Frick, 1998). If the requirement of random sampling is strictly followed, experiments are hardly implemented. In fact, Reichardt and Gollob (1999) found that in a randomized experiment, the use of a t test with a convenience sample can be justified without reference to a hypothetical infinite population, in which random samples are drawn.
To rectify the situation of non-random sampling, randomization is used to spread errors randomly among treatment groups (Fisher, 1971). Pitman (1937a, 1937b, 1938) went so far as to assert that random sampling is unnecessary for a valid test of the difference between treatments in a randomized experiment. Using an example of 40 convenience subjects, Babbie (1992) conceptualized randomization as treating convenience samples as probability samples: "It is as though 40 subjects in this instance are a population from which we select two probability samples-each consisting the characteristics of the total population, so the two samples will mirror each other." (p.243)
However, like random sampling, randomization also encounters difficulties in implementation. Berk (2005) used the following example to illustrate one of the problems: Even if the experimenter randomly assigns prisoners into different treatment programs, the inmate may fail to show up. This can turn the randomized experiment into an "intent-to-treat" experiment.
Simpson's ParadoxIt is important to repeatedly emphasize that Randomization is not the silver bullet. In addition to the attrition issue mentioned above, randomization is subject to the threat of Simpson's Paradox, which was discovered by Dr. E. H. Simpson (1951), not O. J. Simpson or Bart Simpson. Simpson's Paradox is a phenomenon that the conclusion drawn from the aggregate data is opposite to the conclusion drawn from the contingency table based upon the same data.
If it is too abstract to you, let's look at an example: In England once a 20-year follow-up study was conducted to examine the survival rate and death rate of smokers and non-smokers. The result implied a significant positive effect of smoking because only 24% of smokers died compared to 31% of non-smokers. Phillip and Morris should celebrate, right? Not yet. When the data were broken down by age group in a contingency table, it was found that there were more older people in the non-smoker group (Appleton & French, 1996).
Another example of Simpson's Paradox can be found in a study regarding student retention conducted at Arizona State University. Although the initial analysis based on all data (Yu, DiGangi, Jannasch-Pennell, & Kaprolet, 2010) shows that among the students who stay at the university, the probability of being a resident (p=.67) is higher than that of non-residents (p=.33), a seemingly opposite conclusion emerges when observations are grouped by state in a GIS analysis, as shown in the Figure 1:
Figure 1. Retention rate mapped to student home states
How is Simpson's Paradox related to randomization? Obviously, the above study used non-experimental data. You cannot ask people to become smokers or non-smokers. Neither can age be assigned (I wish it can be. If so, I will request to be assigned to the young age group). As a result, two groups which were non-equivalent in age led to Simpson's Paradox. Although randomization is said to prevent this from happening, randomization is not 100% fool-proof. By simulation, Hsu (1989) found that when the sample size is small, randomization tends to make groups become non-equivalent and increase the possibility of Simpson's Paradox. Thus, after randomization with a small sample size, researchers should check the group characteristics on different dimensions (e.g. race, sex, age, academic year, ...etc.) rather than blindly trusting randomization.
Randomized and controlled experiments
Another area of confusion can be commonly found in the difference between randomized and controlled experiments. Today "randomized experiment" and "controlled experiment" are often used synonymously. One of the reasons is that usually an experiment consist of a controlled group and treatment group, and group membership is randomly assigned into one of the groups. Since "control" and "randomization" are both perceived as characteristics of an experiment, it is not surprising that in many texts randomized experiment and controlled experiment are either used in an interchangeable fashion or the two terms are combined as one term such as "randomized controlled experiment." The latter usage is legitimate as long as both control and randomization are implemented in the experiment. However, treating a randomized experiment as "a controlled experiment" and vice versa is misleading (e.g. "In controlled experiments, this is accomplished in part through the random assignment of participants to treatment and control groups" (Schneider et al., 2008)). Indeed, there is a subtle difference between the two.
R. A. Fisher is the pioneer of randomized experiment. In Fisher's view, even if there is a significant difference between the control and the treatment group, we may not be able to attribute the difference to the treatment when there exists many uncontrollable variables and sampling fluctuations. The objective of randomization is to differentiate between associations due to causal effects of the treatment and associations due to some variable that is a common cause to both the treatment and response variables. If there are influences resulted from uncontrolled variables, by randomization the influences would be randomly distributed across the control and treatment groups even though no control of those variables are made.
On the other hand, the logic of experimentation up to Fisher's time was that of controlled experiment. In a control experiment, many variables are experimentally fixed to a constant value. However, Fisher explicitly stated that it is an inferior method, because it is impossible to know what variables should be taken into account. For example, a careful researcher may assign equal numbers of males and females into each group, but she/he may omit the age and educational level of the subjects. In Fisher's view, instead of attempting to put everything under control, the researcher should let randomization take care of the uncontrollable factors. It is not to suggest that Fisher did not advocate controlling for other causes in addition to randomization. Rather he explicitly recommended that the researcher should do as much as control as he can, but he advised that randomization must be employed as "the second line of defense" (Shipley, 2000).
Following the same line of reasoning, the Canadian Task Force for Preventive Health Care (2003) prefers randomized experiments to controlled trials without randomization as clinical evidence, as shown in the following table.
Rating Research design
Evidence from randomized controlled trial(s)
Evidence from controlled trial(s) without randomization
Evidence from cohort or case-control analytic studies, preferably from more than one centre or research group
Evidence from comparisons between times or places with or without the intervention; dramatic results in uncontrolled experiments could be included here
Opinions of respected authorities, based on clinical experience; descriptive studies or reports of expert committees
Nonetheless, a randomized experiment is not necessarily superior to a controlled experiment. As mentioned before, when the sample size is small, randomization tends to make groups become non-isomorphic and thus may lead to a Simpson's Paradox (Hsu, 1989). Not surprisingly, when the sample size is small, a controlled experiment is more advisable.
Smoking does not cause lung cancer, really?
It is important to point out that any dogmatic thinking is counter-productive to science, which is supposed to be an open system. R, A, Fisher, the inventor of randomized experiment, was dead wrong about the relationship between smoking and lung cancer. Between 1922 and 1947 the prevalent rate of deaths attributed to lung cancer surged 15 times across England and Wales. In 1947 Austin Bradford and Richard Doll were hired by the British Medical Research Council to investigate the possible cause of this pandemic. Obviously, it is unethical to conduct a randomized experiment, such as randomly assigning 3,000 healthy people to the smoking group and 3,000 to the control group. Alternatively, Hill and Doll conducted surveys in the hospitals of London. Doll was stunned by the fact that people who smoked tended to die of lung cancer, and in response he gave up smoking two-thirds of the way through the study. In 1950 Hill and Doll published their report in the British Medical Journal, suggesting that there was a causal link between smoking and lung cancer. In 1957 Fisher, who was a smoker, sent a letter to the journal to repudiate their conclusion. His reasoning is simple: without running a randomized experiment we cannot assert a cause and effect relationship between tobacco and lung cancer. Fisher insisted upon his position and kept counter-arguing his opponents until he died in 1962 (Christopher, 2016). Nonethless, at least Fisher was consistent by doing what he said. He kept smoking until his death!
Australian approach cannot work in America, really?
The previous example shows that the dogmas of randomized experimentation could hinder researchers from drawing a sound causal conclusion and delaying countermeasures against threats (e.g. the environmental hazard of DDT pointed out by Silent Spring and climate change suggested by IPCC). In addition, Berwick (2008) challenged the view that randomized experiments can be applied to all situations. Many years ago Rapid Response Team (RRT), an innovative preventative health care approach introduced by Australian doctors, in which a team of physicians and nurses monitor vital signals of patients and take proactive actions, was implemented in the United States. But, randomized experiments conducted by American researchers showed that there were no significant differences between RTT and non-RTT approaches in terms of reducing the number of unexpected deaths. Berwick questioned the validity of the conclusion, for it ignored the cultural context and the specific delivery mechanisms.
Similarly, Rawlins disputed the experimental "gold standard" in medical research by listing the limitations of randomized and controlled experiments. First, like social scientists, sometime medical researchers face a "mission impossible" scenario when the disease under investigation is extremely rare and thus the number of patients is very small. Second, on some occasions experimentation is unnecessary, especially when a treatment produces a "dramatic" benefit, such as Imatinib (Glivec) for chronic myeloid leukemia. In health science research there is a stopping rule. When the treatment shows healing effects, the trial should be stopped early so that the control group can switch to the more effective treatment. There is no consensus among statisticians as to how best to handle this situation, but treating this type of incomplete experiment as invalid would throw out valuable information (cited in Medical News Today, 2008).
Essock et al. (2003) also observed the discrepancy between the "real world" and the lab settings. Many drug treatment studies last about four to eight weeks only. Short-term drug tests may cost less to implement, but usually these studies do not yield the statistical significance that is found in long-term experiments. On the other hand, long-term drug trials have problems in retaining participants long enough to yield unbiased outcomes. In other words, the so-called causal conclusions produced in experiments may not reflect what would happen in the real world.
The dictator game in the real world
The dictator game, which is used very often for studying morality and cooperative behaviors, is another good example. In a typical experiment utilizing the dictator game, the participant is told to decide how much of a $10 pie he would like to give to an anonymous person who also signs up for the same experimental session. The game is so named because the decision made by the giver is final. Most experimental results are encouraging: Many participants were willing to share the wealth. However, the result is completely different when the dictator game is conducted in a naturalistic setting. In a study carried out by Winking and Nizer (2013) at a bus stop in Las Vegas, the researcher told some strangers that he was in a hurry to the airport and therefore he wanted to give away his $20 in casino chips. The researcher explicitly suggested to the receivers to share a portion of the money to another stranger at the bus stop, who was actually a member of the research team. In contrast to the experimental result, no one in the naturalistic study gave any portion of the endowment to the stranger. Thus, Winking and Nizer suspected that in the past the setting of the experimental context induced participants to choose prosocial options.
The Pepsi challenge
The preceding examples may be too remote to you. Let's look at some products that we consume everyday: Coke and Pepsi. In experimental settings, most participants prefer Pepsi to Coke. However, Gladwell (2007) disputed the result by presenting evidence that this so-called "Pepsi Challenge" is based on the unrealistic "sip test" method. Most tasters would favor the sweeter of two beverages when they make a single sip only, but the result is reversed when the entire can or bottle is consumed (I am skeptical of this type of taste tests, including wine tests, coffee tests, water tests...etc. Our limited sensation may not be able to distinguish one from another while the difference is very subtle. In an experiment the researcher tinted the white wine and asked the wine experts to rate the "red wine." Surprisingly, the experts did not recognize that it is not a glass of red wine! The following movies are some examples:
(Similar results are found in coffee tests and water tests).
Other elements and sample size
In educational research, What Works Clearinghouse (WCC) still adopts the conventional ranking of study type. Slavin is critical of this criterion by pointing out that in small, brief, and artificial studies random assignment does not necessarily guarantee validity; over-emphasizing randomized studies without taking sample size and other design elements into account might introduce bias that "can lead to illogical conclusions" (p.11).
Ruling out rival interpretations in quasi-experiment and observational studiesSome statisticians assert that one can never draw causal inferences without experimental manipulation (e.g. SAS Institute, 1999). Some researchers argued that causal inferences are weakened in quasi-experiments (e.g. Keppel & Zedeck, 1989). However, Christensen (1988) held a more liberal position:Many causal inferences are made without using the experimental framework; they are made by rendering other rival interpretations implausible. If a friend of yours unknowingly stepped in front of an oncoming car and was pronounced dead after being hit by the car, you would probably attribute her death to the moving vehicle. Your friend might have died as a result of numerous other causes (a heart attack, for example), but such alternative explanations are not accepted because they are not plausible. In like manner, the causal interpretations arrived at from quasi-experimentation analysis are those that are consistent with the data in situations where rival interpretations have been shown to be implausible. (p.306)I would go further than Christensen to assert that even some observational studies could yield valid causal conclusions. While the example of an car accident in Christensen's argument is hypothetical, we can see a similar example in the real life. Some researchers assert that we could still attribute causal factors to effects with observational data if virtually identical units in two different outcomes are observed. To attribute causal factors to accidents, in Georgia 300 accidents were compared to 300 non-accidents involving the same car, driver, weather condition, and lighting. The non-accidents occurred one mile back on the same road, a location passed by the driver minutes earlier en route to the crash site. Researchers found a substantial excess of roads that curved more than six degrees with downhill gradients. In another example, to answer the question of whether helmets reduce the risk of death in motorcycle crashes, virtually identical units were compared: Cases in which two people rode the same motorcycle, a driver and a passenger, one helmeted and the other was not. Researchers concluded a 40% reduction of risk resulted from wearing a helmet (Rosenbaum, 2005).
A similar scenario could be found in political and economics studies. During the Cold War era, the whole world was divided into three camps, namely, the Communist world led by the Soviet Union and the People's Republic of China, the Capitalist conglomerate led by the United States, and the non-aligned countries. Some countries were partitioned into two political entities due to unresolved ideological differences embraced by different local parties. Obvious examples include North Korea and South Korea, Mainland China and the Republic of China (Taiwan), East Germany and West Germany, as well as North Vietnam and South Vietnam. This division is not a result of randomization, of course. Nevertheless, the observational data about the two camps could still inform us about certain causes and effects. Many years ago philosopher Margaret Walker (person communication) argued that there is no causal relationship between Communist ideology and the horrible consequences in the Communist countries. I held a different view. As mentioned before, we could still attribute causal factors to effects with observational data when virtually identical conditions associated with the outcomes are observed. In terms of cultural heritage, language, and racial attributes, the two countries in each pair on the preceding list share a high degree of resemblances. The major difference is found in the political and economic system only. Owing to self-isolation and the containment policy performed by the West, the Communist blocs could "experiment" with central planning, class struggle, and so on without much outside influence. Needless to say, after half a century people were disenchanted by the broken economy and the lack of human rights in those Communist countries (Courtois et al., 1999). It would be difficult to deny a causal relationship between Communism and those undesirable consequences (Yu, 2009).
Another good example of natural experiment is racial diversity before and after Proposition 209. In 1996, the State of California passed Proposition 209, which prohibited public institutions from using race-based admission policies. After Proposition 209 there was a 50-percent reduction in black freshman enrollment and a 25-percent drop for Hispanics. Nonetheless, although the black and Hispanic enrollment was reduced at the most prestigious University of California campuses (-42% at UC Berkley; -37% at UCLA), other less competitive UC campuses increased their black and Hispanic enrollment (+22% at UC Irvine, +18% at UC Santa Cruz; +65% at UC Riverside) (Sander & Taylor, 2012).
Lurking variables, proxy measure, and theoretical casual variables in correlational studiesArchival research is also called correlational research because cause-and-effect inferences cannot be directly made. For example, even though the last twenty-year data shows a positive correlation between productivity and school performance, it would be a leap of faith to conclude that school performance gain is the cause of productivity gain or vice versa. Usually another variable, which may be the true cause, is "lurking" behind background. This variable is called lurking variable, and is easily undetected by a correlational study.
Even if the researcher is aware of the existence of the lurking variables, he or she has no control of what data were collected. Rather, the researcher must go by the existing variables available in the data bank. Another limitation that hinders the researcher from drawing a valid causal inference from archival data is the problem of indirect measurement. On some occasions the variable chosen by the researcher is a proxy measure of what the researcher intends to study. For example, the researcher may be interested in studying the causal relationship between Christian spirituality and productivity. If the instrument is designed by the researcher, he or she might insert questions like "how often do you pray," "how often do you attend church activities" or other questions specific to Christian spirituality into the survey. However, when archival data are downloaded from the Internet, the researcher might use general demographics (e.g. religion affiliation) to indicate Christian spirituality. In other words, the researcher will make inferences based on inferences (proxy measure). Although the problems of lurking variables and proxy measure could also happen in other types of research methods, they are especially severe when the researcher is unable to customize the instrument.
There are many jokes about careless use of correlational studies. For example, once a study indicated that consumption of alcohol improves academic performance (the explanation may be something else: when the overall economy improves, both alcohol consumption and academic performance go up). A study in Taiwan during the 70s indicates that the more woks a household owned, the fewer children the family had. Thus, the government gave woks to households in an attempt to lower national birth rate. The moral of these stories: researchers should select theoretical casual variables even though the study is correlational.
Nevertheless, Luker, Luker, Jr., Cobb, and Brown (1998) defended the use of causal inference in correlation/regression frameworks:In the social and behavioral sciences, experimental randomization and control are usually not possible. This has led to an awkward condition in which our work does not permit useful policy recommendations. The well-intentioned assertion that relationships do not mean causation, while useful in contesting gross simple-mindedness, is paralyzing and misleading in the social sciences. Or, as Dewey puts it, the critical characteristic of all scientific operations is revealing relationships. Relationships are a necessary condition of causation. We know that X cannot be a cause of Y unless X and Y are related. The causal analysis of nonexperimental data, therefore, can only go on through the analysis of relationships. Causal inference from non-experimental data, then, requires the testing of theoretical causal variables in a variety of quasi-experimental or multiple regression frameworks...Statistical failures of models suggest that we are not on the right track. Confirmation of the models suggests the possibility of ameliorative solutions.
It is noteworthy that the problems faced by experimentation can also be found in quasi-experiments. The main point is that the "real world" is more complicated than an experimental setting, in which the treatment and the outcome, or the cause and the effect has a one-on-one mapping. Murnane and Willet (2011) wrote, "Randomized experiments and quasi-experiments typically provide estimates of the total effect of a policy intervention on one or more outcomes, not the effects of the intervention holding constant the levels of other inputs" (p.31). This issue, which is concerned with internal validity and external validity, will be discussed in the write-up entitled Threats to validity of Research Design.
Explicit questions and selection bias in survey researchWhether causal inferences can be drawn from survey research is debatable. It is true that survey research does not implement any variable manipulation. However, when a questionnaire includes explicit questions concerning rationale and motivation, such as "Why do you choose Web-based instruction over conventional instruction?" it is difficult to explain that the answers provided by respondents do not indicate any cause and effect.
Generalizability always comes hand in hand with causal inferences. Survey research is not weaker than experiment in this regard. In many situations, survey research tends to obtain a more random sample than experimental research does. Usually subjects are required to be physically present in experiment studies, and thus only convenience samples are recruited from the local campus or the local town. On the other hand, survey research can break through this limitation by sending questionnaires to prospective subjects across the country. In the age of Internet, the researcher can even set up an online form to reach potential respondents all over the world.
However, someone may argue that a "cyber-sample" is a self-selected sample rather than a random sample. In this case a systematic bias may affect who responds to the questionnaire and who doesn't. The prediction of "Dewey defeats Truman" by Chicago Daily Tribune in 1948 presidential election is a classic example of selection bias. The interviewees were polled by phone and thus the sample was confined to households who own a telephone. By the same token, when the survey is posted on the Web, it is likely that respondents are computer literate and have access to computer equipment. Indeed, the same problem can be found in experimental research. Subjects could refuse to participate in the experiment or withdraw from the study even though they start the process. In both survey research and experimental research, the question is not whether there are missing data. Rather, the question should be: "Are data completely missing at random?"
Nonetheless, if the subject matter to be studied is Web-based instruction, this should not be considered a selection bias. In an online survey concerning Web-based instruction, the researcher should expect that all respondents possess basic computer operation skills and have access to the Internet (Once I assisted a researcher to post an online survey on my database server. But several respondents, who used 2400 baud modems, complained that it took five to ten minutes to load a page).
Research design and statistical analysis
Traditionally, analysis of variance (ANOVA) is said to be appropriate for data collected in an experiment whereas regression analysis is considered a proper method for data collected in non-experimental designs. Keppel and Zedeck (1989) argued that both ANOVA and regression are suitable to experimental designs while only regression is fitful to most non-experimental designs. In other words, regression is applicable to both experimental and non-experimental deigns when the independent variables are continuous and/or categorical. For this reason, Pedhazur and Schmelkin (1991) asserted that regression is superior to ANOVA. However, Pedhazur and Schmelkin criticized that in non-experimental designs some researchers convert continuous variables into categorical variables in order to fit the data into an ANOVA framework as if it were experimental. This conversion not only leads to loss of information, but also changes the nature of the variables and the design.
For beginnersKerlinger (1986), Shadish, Cook and Campbell (2002) are two good books to get started with experimental design for neither book requires a strong mathematical or statistical background. Their books concentrate on the design aspect rather than the analysis aspect.
Montgomery (2012) is a very updated and comprehensive book though it is written for engineering majors. Readers should be able to follow the content after taking one or two introductory statistics courses. You may skip the chapter on response surface because it may not be applicable to educational and psychological research. Dr. Montgomery is a professor of Industrial Engineering at Arizona State University.
For intermediate usersKennedy and Bush (1985)'s book was written for graduate students in education and psychology who have a modest background in both mathematics and statistics and who are interested in a subject-matter field rather than statistical methodology. One nice thing about the book is that it explains the mathematical notation symbols, which are confusing to many readers.
For beginner and intermediate usersLevine & Parkinson (1994) is a book for both beginners and research professionals. The first half of the book covers experimental methods for psychologists in general whereas the second half covers very detailed examples of experimental methods in cognitive psychology, social psychology, and clinical psychology. Levine and Parkinson are professors of psychology at Arizona State University.
For advanced usersMaxwell & Delaney (1990) and Winer, Brown, and Michels (1991) are considered classics in the field of experimental design. Their books cover both the design and the analysis aspects. However, their books require a very strong statistical background.
Last revised: 2017
- Appleton, D. R. & French, J. M. (1996). Ignoring a covariate: An example of Simpson's paradox. American Statistician, 50, 340-341.
- Babbie, E. (1992). The practice of social research (6th ed.). Belmont, CA: Wadsworth.
- Barkow, J. H. (1989). Darwin, sex, and status: Biological approaches to mind and culture. Toronto: University of Toronto Press.
- Berwick, D. (2008, August). Inference and improvement in health care. Paper presented at the 2008 Joint Statistical Meeting, Denver, CO.