5 Reading empirical studies

Chapter Outline

  1. Reading the results of empirical studies (16 minute read)
  2. Annotating empirical journal articles (15 minute read)
  3. Generalizability and transferability of empirical results (15 minute read)

Content warning: examples in this chapter contain references to domestic violence and details on types of abuse, drug use, poverty, mental health, sexual harassment and details on harassing behaviors, children’s mental health, LGBTQ+ oppression and suicide, obesity, anti-poverty stigma, and psychotic disorders.

5.1 Reading the results of empirical studies

Learning Objectives

Learners will be able to…

  • Describe how statistical significance and confidence intervals demonstrate which results are most important
  • Differentiate between qualitative and quantitative results in an empirical journal article

If you recall from section 3.1, empirical journal articles are those that report the results of quantitative or qualitative data analyzed by the author. They follow a set structure—introduction, methods, results, discussion/conclusions. This section is about reading the most challenging section: results.

I want to normalize not understanding statistics terms and symbols. However, a basic understanding of a results section goes a very long way to understanding the key results in an article. This will take you beyond the two or three sentences in the abstract that summarize the study’s results and into the nitty-gritty of what they found for each concept they studied.

Read beyond the abstract

At this point, I have read hundreds of literature reviews written by students. One of the challenges I have noted is that students will report the results as summarized in the abstract, rather than the detailed findings laid out in the results section of the article. This poses a problem when you are writing a literature review because you need to provide specific and clear facts that support your reading of the literature. The abstract may say something like: “we found that poverty is associated with mental health status.” For your literature review, you want the details, not the summary. In the results section of the article, you may find a sentence that states: “children living in households experiencing poverty are three times more likely to have a mental health diagnosis.” This more specific statistical information provides a stronger basis on which to build the arguments in your literature review.

Using the summarized results in an abstract is an understandable mistake to make. The results section often contains figures and tables that may be challenging to understand. Often, without having completed more advanced coursework on statistical or qualitative analysis, some of the terminology, symbols, or diagrams may be difficult to comprehend. This section is all about how to read and interpret the results of an empirical (quantitative or qualitative) journal article. Our discussion here will be basic, and in parts three and four of the textbook, you will learn more about how to interpret results from statistical tests and qualitative data analysis.

Remember, this section only addresses empirical articles. Non-empirical articles (e.g., theoretical articles, literature reviews) don’t have results. They cite the analysis of raw data completed by other authors, not the person writing the journal article who is merely summarizing others’ work.

 

Quantitative results

Quantitative articles often contain tables, and scanning them is a good way to begin reading the results. A table usually provides a quick, condensed summary of the report’s key findings. Tables are a concise way to report large amounts of data. Some tables present descriptive information about a researcher’s sample (often the first table in a results section). These tables will likely contain frequencies (N) and percentages (%). For example, if gender happened to be an important variable for the researcher’s analysis, a descriptive table would show how many and what percent of all study participants are of a particular gender. Frequencies or “how many” will probably be listed as N, while the percent symbol (%) might be used to indicate percentages.

In a table presenting a causal relationship, two sets of variables are represented. The independent variable, or cause, and the dependent variable, the effect. We will discuss these further when we review quantitative conceptualization and measurement. Independent variable attributes are typically presented in the table’s columns, while dependent variable attributes are presented in rows. This allows the reader to scan a table’s rows to see how values on the dependent variable change as the independent variable values change (i.e., changes in the dependent variable depend on changes in the independent variable). Tables displaying results of quantitative analysis will also likely include some information about which relationships are significant or not. We will discuss the details of significance and p-values later in this section.

Let’s look at a specific example: Table 5.1. It presents the causal relationship between gender and experiencing harassing behaviors at work. In this example, gender is the independent variable (the cause) and the harassing behaviors listed are the dependent variables (the effects).[1] Therefore, we place gender in the table’s columns and harassing behaviors in the table’s rows.

Reading across the table’s top row, we see that 2.9% of women in the sample reported experiencing subtle or obvious threats to their safety at work, while 4.7% of men in the sample reported the same. We can read across each of the rows of the table in this way. Reading across the bottom row, we see that 9.4% of women in the sample reported experiencing staring or invasion of their personal space at work while just 2.3% of men in the sample reported having the same experience. We’ll discuss p values later in this section.

 

Table 5.1 Percentage reporting harassing behaviors at work
Behavior experienced at work Women Men p-value
Subtle or obvious threats to your safety 2.9% 4.7% 0.623
Being hit, pushed, or grabbed 2.2% 4.7% 0.480
Comments or behaviors that demean your gender 6.5% 2.3% 0.184
Comments or behaviors that demean your age 13.8% 9.3% 0.407
Staring or invasion of your personal space 9.4% 2.3% 0.039
Note: Sample size was 138 for women and 43 for men.

While you can certainly scan tables for key results, they are often difficult to understand without reading the text of the article. The article and table were meant to complement each other, and the text should provide information on how the authors interpret their findings. The table is not redundant with the text of the results section. Additionally, the first table in most results sections is a summary of the study’s sample, which provides more background information on the study than information about hypotheses and findings. It is also a good idea to look back at the methods section of the article as the data analysis plan the authors outline should walk you through the steps they took to analyze their data which will inform how they report them in the results section.

Statistical significance

The statistics reported in Table 5.1 represent what the researchers found in their sample. The purpose of statistical analysis is usually to generalize from a the small number of people in a study’s sample to a larger population of people. Thus, the researchers intend to make causal arguments about harassing behaviors at workplaces beyond those covered in the sample.

Generalizing is key to understanding statistical significance. According to Cassidy and colleagues, (2019)[2] 89% of research methods textbooks in psychology define statistical significance incorrectly. This includes an early draft of this textbook which defined statistical significance as “the likelihood that the relationships we observe could be caused by something other than chance.” If you have previously had a research methods class, this might sound familiar to you. It certainly did to me!

But statistical significance is less about “random chance” than more about the null hypothesis. Basically, at the beginning of a study a researcher develops a hypothesis about what they expect to find, usually that there is a statistical relationship between two or more variables. The null hypothesis is the opposite. It is the hypothesis that there is no relationship between the variables in a research study. Researchers then can hopefully reject the null hypothesis because they find a relationship between the variables.

For example, in Table 5.1 researchers were examining whether gender impacts harassment. Of course, researchers assumed that women were more likely to experience harassment than men. The null hypothesis, then, would be that gender has no impact on harassment. Once we conduct the study, our results will hopefully lead us to reject the null hypothesis because we find that gender impacts harassment. We would then generalize from our study’s sample to the larger population of people in the workplace.

Statistical significance is calculated using a p-value which is obtained by comparing the statistical results with a hypothetical set of results if the researchers re-ran their study a large number of times. Keeping with our example, imagine we re-ran our study with different men and women from different workplaces hundreds and hundred of times and we assume that the null hypothesis is true that gender has no impact on harassment. If results like ours come up pretty often when the null hypothesis is true, our results probably don’t mean much. “The smaller the p-value, the greater the statistical incompatibility with the null hypothesis” (Wasserstein & Lazar, 2016, p. 131).[3] Generally, researchers in the social sciences have used 0.05 as the value at which a result is significant (p is less than 0.05) or not significant (p is greater than 0.05). The p-value 0.05 refers to if 5% of those hypothetical results from re-running our study show the same or more extreme relationships when the null hypothesis is true. Researchers, however, may choose a stricter standard such as 0.01 in which only 1% of those hypothetical results are more extreme or a more lenient standard like 0.1 in which 10% of those hypothetical results are more extreme than what was found in the study.

Let’s look back at Table 5.1. Which one of the relationships between gender and harassing behaviors is statistically significant? It’s the last one in the table, “staring or invasion of personal space,” whose p-value is 0.039 (under the p<0.05 standard to establish statistical significance). Again, this indicates that if we re-ran our study over and over again and gender did not impact staring/invasion of space (i.e., the null hypothesis was true), only 3.9% of the time would we find similar or more extreme differences between men and women than what we observed in our study. Thus, we conclude that for staring or invasion of space only, there is a statistically significant relationship.

For contrast, let’s look at “being pushed, hit, or grabbed” and run through the same analysis to see if it is statistically significant. If we re-ran our study over and over again and the null hypothesis was true, 48% of the time (p=.48) we would find similar or more extreme differences between men and women. That means these results are not statistically significant.

This discussion should also highlight a point we discussed previously: that it is important to read the full results section, rather than simply relying on the summary in the abstract. If the abstract stated that most tests revealed no statistically significant relationships between gender and harassment, you would have missed the detail on which behaviors were and were not associated with gender. Read the full results section! And don’t be afraid to ask for help from a professor in understanding what you are reading, as results sections are often not written to be easily understood.

Statistical significance and p-values have been critiqued recently for a number of reasons, including that they are misused and misinterpreted (Wasserstein & Lazar, 2016)[4], that researchers deliberately manipulate their analyses to have significant results (Head et al., 2015)[5], and factor into the difficulty scientists have today in reproducing many of the results of previous social science studies (Peng, 2015).[6] For this reason, we share these principles, adapted from those put forth by the American Statistical Association,[7] for understanding and using p-values in social science:

  1. P-values provide evidence against a null hypothesis.
  2. P-values do not indicate whether the results were produced by random chance alone or if the researcher’s hypothesis is true, though both are common misconceptions.
  3. Statistical significance can be detected in minuscule differences that have very little effect on the real world.
  4. Nuance is needed to interpret scientific findings, as a conclusion does not become true or false when the p-value passes from p=0.051 to p=0.049.
  5. Real-world decision-making must use more than reported p-values. It’s easy to run analyses of large datasets and only report the significant findings.
  6. Greater confidence can be placed in studies that pre-register their hypotheses and share their data and methods openly with the public.
  7. “By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. For example, a p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis. Likewise, a relatively large p-value does not imply evidence in favor of the null hypothesis; many other hypotheses may be equally or more consistent with the observed data” (Wasserstein & Lazar, 2016, p. 132).

Confidence intervals

Because of the limitations of p-values, scientists can use other methods to determine whether their models of the world are true. One common approach is to use a confidence interval, or a range of values in which the true value is likely to be found. Confidence intervals are helpful because, as principal #5 above points out, p-values do not measure the size of an effect (Greenland et al., 2016).[8] Remember, something that has very little impact on the world can be statistically significant, and the values in a confidence interval would be helpful. In our example from Table 5.1, imagine our analysis produced a confidence interval that women are 1.2-3.4x more likely to experience “staring or invasion of personal space” than men. As with p-values, calculation for a confidence interval compares what was found in one study with a hypothetical set of results if we repeated the study over and over again. If we calculated 95% confidence intervals for all of the hypothetical set of hundreds and hundreds of studies, that would be our confidence interval. 

Confidence intervals are pretty intuitive. As of this writing, my wife and are expecting our second child. The doctor told us our due date was December 11th. But the doctor also told us that December 11th was only their best estimate. They were actually 95% sure our baby might be born any time in the 30-day period between November 27th and December 25th. Confidence intervals are often listed with a percentage, like 90% or 95%, and a range of values, such as between November 27th and December 25th. You can read that as: “we are 95% sure your baby will be born between November 27th and December 25th because we’ve studied hundreds of thousands of fetuses and mothers, and we’re 95% sure your baby will be within these two dates.”

Notice that we’re hedging our bets here by using words like “best estimate.” When testing hypotheses, social scientists generally phrase their findings in a tentative way, talking about what results “indicate” or “support,” rather than making bold statements about what their results “prove.” Social scientists have humility because they understand the limitations of their knowledge. In a literature review, using a single study or fact to “prove” an argument right or wrong is often a signal to the person reading your literature review (usually your professor) that you may not have appreciated the limitations of that study or its place in the broader literature on the topic. Strong arguments in a literature review include multiple facts and ideas that span across multiple studies.

You can learn more about creating tables, reading tables, and tests of statistical significance in a class focused exclusively on statistical analysis. We provide links to many free and openly licensed resources on statistics in Chapter 16. For now, we hope this brief introduction to reading tables will improve your confidence in reading and understanding the results sections in quantitative empirical articles.

Qualitative results

Quantitative articles will contain a lot of numbers and the results of statistical tests demonstrating associations between those numbers. Qualitative articles, on the other hand, will consist mostly of quotations from participants. For most qualitative articles, the authors want to put their results in the words of their participants, as they are the experts. Articles that lack quotations make it difficult to assess whether the researcher interpreted the data in a trustworthy, unbiased manner. These types of articles may also indicate how often particular themes or ideas came up in the data, potentially reflective of how important they were to participants.

Authors often organize qualitative results by themes and subthemes. For example, see this snippet from the results section in Bonanno and Veselak (2019)[9] discussion parents’ attitudes towards child mental health information sources.

Data analysis revealed four themes related to participants’ abilities to access mental health help and information for their children, and parents’ levels of trust in these sources. These themes are: others’ firsthand experiences family and friends with professional experience, protecting privacy, and uncertainty about schools as information sources. Trust emerged as an overarching and unifying concept for all of these themes.

Others’ firsthand experiences. Several participants reported seeking information from other parents who had experienced mental health struggles similar to their own children. They often referenced friends or family members who had been or would be good sources of information due to their own personal experiences. The following quote from Adrienne demonstrates the importance of firsthand experience:

[I would only feel comfortable sharing concerns or asking for advice] if I knew that they had been in the same situation. (Adrienne)

Similarly, Michelle said: And I talked to a friend of mine who has kids who have IEPs in the district to see, kind of, how did she go about it. (Michelle)

Friends/family with professional experience. Several respondents referred to friends or family members who had professional experience with or knowledge of child mental health and suggested that these individuals would be good sources of information. For example, Hannah said:

Well, what happened with me was I have an uncle who’s a psychiatrist. Sometimes if he’s up in (a city to the north), he’s retired, I can call him sometimes and get information. (Hannah)

Michelle, who was in nursing school, echoed this sentiment: At this point, [if my child’s behavioral difficulties continued], I would probably call one of my [nursing] professors. That’s what I’ve done in the past when I’ve needed help with certain things…I have a professor who I would probably consider a friend who I would probably talk to first. She has a big adolescent practice. (Michelle) (p. 402-403)

The terms in bold above refer to the key themes (i.e., qualitative results) that were present in the data. Researchers will state the process by which they interpret each theme, providing a definition and usually some quotations from research participants. Researchers will also draw connections between themes, note consensus or conflict over themes, and situate the themes within the study context.

Qualitative results are specific to the time, place, and culture in which they arise, so you will have to use your best judgment to determine whether these results are relevant to your study. For example, students in my class at Radford University in Southwest Virginia may be studying rural populations. Would a study on group homes in a large urban city transfer well to group homes in a rural area?

Maybe. But even if you were using data from a qualitative study in another rural area, are all rural areas the same? How is the client population and sociocultural context in the article similar or different to the one in your study? Qualitative studies have tremendous depth, but researchers must be intentional about drawing conclusions about one context based on a study in another context. To make conclusions about how a study applies in another context, researchers need to examine each component of an empirical journal article–they need to annotate!

Key Takeaways

  • The results section of empirical articles are often the most difficult to understand.
  • To understand a quantitative results section, look for results that were statistically significant and examine the confidence interval, if provided.
  • To understand a qualitative results section, look for definitions of themes or codes and use the quotations provided to understand the participants’ perspective.

Exercises

Select a quantitative empirical article related to your topic.

  • Write down the results the authors identify as statistically significant in the results section.
  • How do the authors interpret their results in the discussion section?
  • Do the authors provide enough information in the introduction for you to understand their results?

Select a qualitative empirical article relevant to your topic.

  • Write down the key themes the authors identify and how they were defined by the participants.
  • How do the authors interpret their results in the discussion section?
  • Do the authors provide enough information in the introduction for you to understand their results?

5.2 Annotating empirical journal articles

Learning Objectives

Learners will be able to…

  • Define annotation and describe how to use it to identify, extract, and reflect on the information you need from an article

Annotation refers to the process of writing notes on an article. There are many ways to do this. The most basic technique is to print out the article and build a binder related to your topic. Raul Pacheco-Vega’s excellent blog has a post on his approach to taking physical notes. Honestly, while you are there, browse around that website. It is full of amazing tips for students conducting a literature review and graduate research projects. I see a lot of benefits to the paper, pen, and highlighter approach to annotating articles. Personally though, I prefer to use a computer to write notes on an article because my handwriting is terrible and typing notes allows me search for keywords. For other students, electronic notes work best because they cannot afford to print every article that they will use in their paper. No matter what you use, the point is that you need to write notes when you’re reading. Reading is research!

There are a number of free software tools you can use to help you annotate a journal article. Most PDF readers like Adobe Acrobat have a commenting and highlighting feature, though the PDF readers included with internet browsers like Google Chrome, Microsoft Edge, and Safari do not have this feature. The best approach may be to use a citation manager like Zotero. Using a citation manager, you can build a library of articles, save your annotations, and link annotations across PDFs using keywords. They also provide integration with word processing programs to help with citations in a reference list

See this video tutorial from McGill Library on how to set up and use Zotero for college students. Seriously, it will produce correct APA citations for you, organize your references, and host all of your annotations for free!

 

Of course, I don’t follow this advice because I have a system that works well for me. I have a PDF open in one computer window and a Word document open in a window next to it. I type notes and copy quotes, listing the page number for each note I take. It’s a bit low-tech, but it does make my notes searchable. This way, when I am looking for a concept or quote, I can simply search my notes using the Find feature in Word and get to the information I need.

Annotation and reviewing literature does not have to be a solo project. If are working in a group, you can use the Hypothes.is web browser extension to annotate articles collaboratively. You can also use Google Docs to collaboratively annotate a shared PDF using the commenting feature and write collaborative notes in a shared document. By sharing your highlights and comments, you can split the work of getting the most out of each article you read and build off one another’s ideas.

Common annotations

In this section, we present common annotations people make when reading journal articles. These annotations are adapted from Craig Whippo and Raul Pacheco-Vega. If you are annotating on paper, I suggest using different color highlighters for each type of annotation listed below. If you are annotating electronically, you can use the names below as tags to easily find information later. For example, if you are searching for definitions of key concepts, you can either click on the tag for [definitions] in your PDF reader or thumb through a printed copy of article for whatever color or tag you used to indicate definitions of key terms. Most of all, you want to avoid reading through all of your sources again just to find that one thing you know you read somewhere. Time is a graduate student’s most valuable resource, so our goal here is to help you spend your time reading the literature wisely.

Personal reflections

Personal reflections are all about you. What do you think? Are there any areas you are confused about? Any new ideas or reflections come to mind while you’re reading? Treat these annotations as a means of capturing your first reflections about an article. Write down any questions or thoughts that come to mind as you read. If you think the author says something inaccurate or unsubstantiated, write that down. If you don’t understand something, make a note about it and ask your professor. Don’t feel bad! Journal articles are hard to understand sometimes, even for professors. Your goal is to critically read the literature, so write down what you think while reading! Table 4.2 contains some questions that might stimulate your thoughts.

 

Table 5.2 Questions worth asking while reading research reports
Report section Questions worth asking
Abstract What are the key findings? How were those findings reached? How does the author frame their study?
Acknowledgments Who are this study’s major stakeholders? Who provided feedback? Who provided support in the form of funding or other resources?
Problem statement (introduction) How does the author frame the research focus? What other possible ways of framing the problem exist? Why might the author have chosen this particular way of framing the problem?
Literature review
(introduction)
What are the major themes the author identifies in the literature? Are there any gaps in the literature? Does the author address challenges or limitations to the studies they cite? Is there enough literature to frame the rest of the article or do you have unanswered questions? Does the author provide conceptual definitions for important ideas or use a theoretical perspective to inform their analysis?
Sample (methods) Where was the data collected? Did the researchers provide enough information about the sample and sampling process for you to assess its quality? Did the researchers collect their own data or use someone else’s data? What population is the study trying to make claims about, and does the sample represent that population well? What are the sample’s major strengths and major weaknesses?
Data collection (methods) How were the data collected? What do you know about the relative strengths and weaknesses of the methods employed? What other methods of data collection might have been employed, and why was this particular method employed? What do you know about the data collection strategy and instruments (e.g., questions asked, locations observed)? What don’t you know about the data collection strategy and instruments? Look for appendixes and supplementary documents that provide details on measures.
Data analysis (methods) How were the data analyzed? Is there enough information provided for you to feel confident that the proper analytic procedures were employed accurately? How open are the data? Can you access the data in an open repository? Did the researchers register their hypotheses and methods prior to data collection? Is there a data disclosure statement available?
Results What are the study’s major findings? Are findings linked back to previously described research questions, objectives, hypotheses, and literature? Are sufficient amounts of data (e.g., quotes and observations in qualitative work, statistics in quantitative work) provided to support conclusions? Are tables readable?
Discussion/conclusion Does the author generalize to some population beyond the sample? How are these claims presented? Are claims supported by data provided in the results section (e.g., supporting quotes, statistical significance)? Have limitations of the study been fully disclosed and adequately addressed? Are implications sufficiently explored?

Definitions

Note definitions of key terms for your topic. At minimum, you should include a scholarly definition for the concepts represented in your working question. If your working question asks about the process of leaving a relationship with domestic violence, your research proposal will have to explain how you define domestic violence, as well as how you define “leaving” an abusive relationship. While you may already know what you mean by domestic violence, the person reading your research proposal does not.

Annotating definitions also helps you engage with the scholarly debate around your topic. Definitions are often contested among scholars. Some definitions of domestic violence will be more comprehensive, including things such economic abuse or forcing the victim to problematically use substances. Other definitions will be less comprehensive, covering only physical, verbal, and sexual abuse. Often, how someone defines something conceptually is highly related to how they measure it in their study. Since you will have to do both of these things, find a definition that feels right to you or create your own, noting the ways in which it is similar or different from those in the literature.

Definitions are also an important way of dealing with jargon. Becoming familiar with a new content area involves learning the jargon experts use. For example, in the last paragraph I used the term economic abuse, but that’s probably not a term you’ve heard before. If you were conducting a literature review on domestic violence, you would want to search for keywords like economic abuse if they are relevant to your working question. You will also want to know what they mean so you can use them appropriately in designing your study and writing your literature review.

Theoretical perspective

Noting the theoretical perspective of the article can help you interpret the data in the same manner as the author. For example, articles on supervised injection facilities for people who use intravenous drugs most likely come from a harm reduction perspective, and understanding the theory behind harm reduction is important to make sense of empirical results. Articles should be grounded in a theoretical perspective that helps the author conceptualize and understand the data. As we discussed in Chapter 3, some journal articles are entirely theoretical and help you understand the theories or conceptual models related to your topic. We will help you determine a theoretical perspective for your project in Chapter 7. For now, it’s a good idea to note what theories authors mention when talking about your topic area. Some articles are better about this than others, and many authors make it a bit challenging to find theory (if mentioned at all). In other articles, it may help to note which social work theories are missing from the literature. For example, a study’s findings might address issues of oppression and discrimination, but the authors may not use critical theory to make sense of what happened.

Background knowledge

It’s a good idea to note any relevant information the author relies on for background. When an author cites facts or opinions from others, you are subsequently able to get information from multiple articles simultaneously. For example, if we were looking at this meta-analysis about domestic violence, in the introduction section, the authors provide facts from many other sources. These facts will likely be relevant to your inquiry on domestic violence, as well.

As you are looking at background information, you should also note any subtopics or concepts about which there is controversy or consensus. The author may present one viewpoint and then an opposing viewpoint, something you may do in your literature review as well. Similarly, they may present facts that scholars in the field have come to consensus on and describe the ways in which different sources support these conclusions.

Sources of interest

Note any relevant sources the author cites. If there is any background information you plan to use, note the original source of that information. When you write your literature review, cite the original source of a piece of information you are using, which may not be where you initially read it. Remember that you should read and refer to the primary source. If you are reading Article A and the author cites a fact from Article B, you should note Article B in your annotations and use Article B when you cite the fact in your paper. You should also make sure Article A interpreted Article B correctly and scan Article B for any other useful facts.

Research question/Purpose

Authors should be clear about the purpose of their article. Charitable authors will give you a sentence that starts with something like this:

  • “The purpose of this research project was…”
  • “Our research question was…”
  • “The research project was designed to test the following hypothesis…”

Unfortunately, not all authors are so clear, and you may to hunt around for the research question or hypothesis. Generally, in an empirical article, the research question or hypothesis is at the end of the introduction. In non-empirical articles, the author will likely discuss the purpose of the article in the abstract or introduction.

Results

We will discuss in greater detail how to read the results of empirical articles in Chapter 5. For now, just know that you should highlight any of the key findings of an article. They will be described very briefly in the abstract, and in much more detail in the article itself. In an empirical article, you should look at both the ‘Results’ and ‘Discussion’ sections. For a non-empirical article, the key findings will likely be in the conclusion. You can also find them in the topic or concluding sentences in a paragraph within the body of the article.

Measures

How do researchers know something when they see it? Found in the ‘Methods’ section of empirical articles, the measures section is where researchers spell out the tools, or measures, they used to gather data. For quantitative studies, you will want to get familiar with the questions researchers typically use to measure key variables. For example, to measure domestic violence, researchers often use the Conflict Tactics Scale. The more frequently used and cited a measure is, the more we know about how well it works (or not). Qualitative studies will often provide at least some of the interview or focus group questions they used with research participants. They will also include information about how their inquiry and hypotheses may have evolved over time. Keep in mind however, sometimes important information is cut out of an article during editing. If you need more information, consider reaching out to the author directly. Before you do so, check if the author provided an appendix with the information you need or if the article links to a their data and measures as part open data sharing practices.

Sample

Who exactly were the study participants and how were they recruited? In quantitative studies, you will want to pay attention to the sample size. Generally, the larger the sample, the greater the study’s explanatory power. Additionally, randomly drawn samples are desirable because they leave any variation up to chance. Samples that are conducted out of convenience can be biased and non-representative of the larger population. In qualitative studies, non-random sampling is appropriate but consider this: how well does what we find for this group of people transfer to the people who will be in your study? For qualitative studies and quantitative studies, look for how well the sample is described and whether there are important characteristics missing from the article that you would need to determine the quality of the sample.

Limitations

Honest authors will include these at the end of each article. But you should also note any additional limitations you find with their work as well.

Your annotations

These are just a few suggested annotations, but you can come up with your own. For example, maybe there are annotations you would use for different assignments or for the problem statement in your research proposal. If you have an argument or idea that keeps coming to mind when you read, consider creating an annotation for it so you can remember which part of each article supports your ideas. Whatever works for you. The goal with annotation is to extract as much information from each article while reading, so you don’t have to go back through everything again. It’s useless to read an article and forget most of what you read. Annotate!

Key Takeaways

  • Begin your search by reading thorough and cohesive literature reviews. Review articles are great sources of information to get a broad perspective of your topic.
  • Don’t read an article just to say you’ve read it. Annotate and take notes so you don’t have to re-read it later.
  • Use software or paper-and-pencil approaches to write notes on articles.
  • Annotation is best used when closely reading an empirical study highly similar to your research project.

Exercises

  • Select an empirical article highly related to the study you would like to conduct.
  • Annotate the article using the aforementioned annotations and create some of your own.
  • Create the first draft of a summary table with key information from this empirical study that you would like to compare to other empirical studies you closely read.

5.3 Generalizability and transferability of empirical results

Learning Objectives

Learners will be able to…

  • Define generalizability and transferability.
  • Assess the generalizability and transferability to how researchers use the results from empirical research studies to make arguments about what is objectively true.
  • Relate both concepts to the hierarchy of evidence and the types of articles in the scholarly literature

Now that you have read an empirical article in detail, it’s important to put its results in conversation with the broader literature on your topic. In this chapter we discuss two important concepts–generalizability and transferability–and the interrelationship between the two. We also explain how these two properties of empirical data impact your literature review and evidence-based practice.

Generalizability

The figure below provides a common approach to assessing empirical evidence. As you move up the pyramid below, you can be more sure that the data contained in those studies generalizes to all people who experience the issue.

An evidence pyramid with case studies on bottom and systematic reviews on top. It reviews how each stage builds on top of the next in improving quality of evidence
Figure 5.1 Quality of evidence by type of article

As we reviewed in Chapter 1, objective truth is true for everyone, regardless of context. In other words, objective truths generalize beyond the sample of people from whom data were collected to the larger population of people who experience the issue under examination. You can be much more sure that information from a systematic review or meta-analysis will generalize than something from a case study of a single person, pilot projects, and other studies that do not seek to establish generalizability.

The type of article listed here is also related to the types of research methods the authors used. While we cover many of these approaches in this textbook, some of them (like cohort studies) are somewhat less common in social work. Additionally, there is one important research method, survey design, that does not appear in this diagram. Finally, social work research uses many different types of qualitative research–some of which generates more generalizable data than others.

For a refresher on the different types of evidence available in each type of article, refer back to section 4.1. You’ll recall the hierarchy of evidence as described by McNeese & Thyer (2004)[10]

  1. Systematic reviews and meta-analyses
  2. Randomized controlled trials
  3. Quasi-experimental studies
  4. Case-control and cohort studies
  5. Pre-experimental (or non-experimental) group studies
  6. Surveys
  7. Qualitative studies

Because there is further variation in the types of studies used by social work researchers, I expanded the hierarchy of evidence to cover a greater breadth of research methods in Figure 5.3.

Refined information from multiple sources

The top of the hierarchy represents refined scientific information or meta-research. Meta-research uses the scientific method to analyze and improve the scientific production of knowledge. For example, meta-analyses pull together samples of people from all high-quality studies on a given topic area creating a super-study with far more people than any single researcher could feasibly collect data from. Because scientists (and clinical experts) refine data across multiple studies, these represent the most generalizable research findings.

Of course, not all meta-analyses or systematic reviews are of good quality. As a peer reviewer for a scholarly journal, I have seen poor quality systematic reviews that make methodological mistakes—like not including relevant keywords—that lead to incorrect conclusions. Unfortunately, not all errors are caught in the peer review process, and not all limitations are acknowledged by the authors. Just because you are looking at a systematic review does not mean you are looking at THE OBJECTIVE TRUTH. Nevertheless, you can be pretty sure that results from these studies are generalizable to the population in the study’s research question.

A good way to visualize the process of sampling is by examining the procedure used for systematic reviews and meta-analyses to scientifically search for articles. In Figure 5.4 below, you can see how researchers conducting a systematic review identified a large pool of potentially relevant articles, downloaded and analyzed them for relevance, and in the end, analyzed only 71 articles in their systematic review out of a total of 1,589 potentially relevant articles. Because systematic reviews or meta-analyses are intended to make strong, generalizable conclusions, they often exclude studies that still contain good information.

In the process of selecting articles for a meta-analysis and systematic review, researchers may exclude articles with important information for a number of good reasons. No study is perfect, and all research methods decisions come with limitations–including meta-research. Authors conducting a meta-analysis cannot include a study unless researchers provide data for the authors to include in their meta-analysis, and many empirical journal articles do not make their data available. Additionally, a study’s intervention or measures may be a bit different than what researchers want to make conclusions about. This is a key truth applicable across all articles you read—who or what gets selected for analysis in a research project determines how well the project’s results generalize to everyone.

We will talk about this in future chapters as sampling, and in those chapters, we will learn which sampling approaches are intended to support generalizability and which are used for other purposes. For example, availability or convenience sampling is often used to get quick information while random sampling approaches are intended to support generalizability. It is impossible to know everything about your article right now, but by the end of this course, you will have the information you need to critically examine the generalizability of a sample.

Primary sources (empirical studies)

Because refined sources like systematic reviews exclude good studies, they are only a first step in getting to know a topic area. You will need to examine primary sources–the reports of researchers who conducted empirical studies–to make evidence-based conclusions about your topic. Figure 5.3 describes three different types of data and ranks them vertically based on how well you can be sure the information generalizes.

As we will discuss further in our chapter on causal explanations, a key factor in scientifically assessing what happened first. Researchers conducting intervention studies are causing change by providing therapy, housing, or whatever the intervention is and measuring the outcomes of that intervention after they happen. This is unlike survey researchers, who do not introduce an intervention but ask people to self-report information on a questionnaire. Longitudinal surveys are particularly helpful because they can provide a clearer picture of whether the cause came before the effect in a causal relationship, but because they are expensive and time-consuming to conduct, longitudinal studies are relatively rare in the literature and most surveys measure people at only one point in time. Thus, because researchers cannot tightly control the causal variable (an intervention, an experience of abuse, etc.) we can be somewhat less certain of the conclusions of surveys than experiments. At the same time, because surveys measure people in their naturalistic environment rather than in a laboratory or artificial setting, they may do a better job at reducing the potential for the researcher to influence the data a participant provides. Surveys also provide descriptive information–like the number of people with a diagnosis or risk factor–that experiments cannot provide.

Surveys and experiments are commonly used in social work, and we will describe the methods they use in future chapters. When assessing the generalizability of a given survey or experiment, you are looking at whether the methods used by the researchers improve generalizability (or, at least that those methods are intended to improve generalizability). Specifically, there are sampling, measurement, and design decisions that researchers make that can improve generalizability. And once the study is conducted, whether those methods worked as intended also impact generalizability.

We address sampling, measurement, and design in the coming chapters, and you will need more in-depth knowledge of research methods to assess the generalizability of the results you are reading. In the meantime, Figure 5.3 is organized by design, and this is a good starting point for your inquiry since it only requires you to identify the design in each empirical article–which should be included in the abstract and described in detail in the methods section. For more information on how to conduct sampling, measurement, and design in a way that maximizes generalizability, read Part 2 of this textbook.

When searching for design of a study, look for specific keywords that indicate the researcher used methods that do not generalize well like pilot study, pre-experiment, non-experiment, convenience sample, availability sample, and exploratory study. When researchers are seeking to perform a pilot study, they are optimizing for time, not generalizability. Their results may still be useful to you! But, you should not generalize from their study to all people with the issue under analysis without a lot of caution and additional supporting evidence. Instead, you should see whether the lessons from this study might transfer to the context in which you are researching–our next topic.

Qualitative studies use sampling, measures, and designs that do not try to optimize generalizability. Thus, if the results of a qualitative study indicate 10 out of 50 students who participated in the focus group found the mandatory training on harassment to be unhelpful, does that mean 20% of all college students at this university find it unhelpful? Because focus groups and interviews (and other qualitative methods we will discuss) use qualitative methods, they are not concerned with generalizability. It would not make sense to generalize from focus groups to all people in a population. Instead, focus groups methods optimize for trustworthy and authentic research projects that make sure, for example, all themes and quotes in the researcher’s report are traceable to quotes from focus group participants. Instead of providing what is generally true, qualitative research provides a thick description of people’s experiences so you can understand them. Subjective inquiry is less generalizable but provides greater depth in understanding people’s feelings, beliefs, and decision-making processes within their context. 

In Figure 5.3, you will note that some qualitative studies are ranked higher than others in terms of generalizability. Meta-syntheses are ranked highest because they are meta-research, pooling together the themes and raw data from multiple qualitative studies into a super-study. A meta-synthesis is the qualitative equivalent of a meta-analysis, which analyzes quantitative data. Because the researchers conducting the meta-syntheses aim to make more broad generalizations across research studies, even though generalizability is not strictly the goal. In a similar way, grounded theory studies (a type of qualitative design) aim to produce a testable hypothesis that could generalize. At the bottom of the hierarchy are individual case studies, which report what happens with a single person, organization, or event. It’s best not to think too long about the generalizability of qualitative results. When examining qualitative articles, you should be examining their transferability, our topic for the next subsection.

Transferability

Generalizability asks one question: How well does the sample of people in this study represent everyone with this issue? If you read in a study that 50% of people in the sample experienced depression, does that mean 50% of everyone experiences depression? We previewed future discussions in this textbook that will discuss the specific quantitative research methods used to optimize the generalizability of results. By adhering strictly to best practices in sampling, measurement, and design, researchers can provide you with good evidence for the generalizability of their study’s results.

Of course, generalizability is not the only question worth asking. Just because a study’s sample represents a broader population does not mean it is helpful for making conclusions about your working question. In assessing a study’s transferability, you are making a weaker but compelling argument that the conclusions of one study can be applied to understanding the people in your working question and research project. Generalizable results may be applicable because they are broadly transferable across situations, and you can be confident in that when they follow the best practices in this textbook for improving generalizability. However, there may be aspects of a study that make its results difficult to transfer to your topic area.

When evaluating the transferability of a research result to your working question, consider the sample, measures, and design. That is, how data was collected from individuals, who those individuals are, and what researchers did with them. You may find that the samples in generalizable studies do not talk about the specific ethnic, cultural, or geographic group that is in your working question. Similarly, studies that measure the outcomes of substance use treatment by measuring sobriety may not match your working question on moderation, medication adherence, or substitution as an outcome in substance use treatment. Evaluating the transferability of designs may help you identify whether the methods the authors used would be similar to those you might use if you were to conduct a study gathering and collecting your own raw data.

Assessing transferability is more subjective. You are using your knowledge of your topic area and research methods (which are always improving!) to make a reasonable argument about why a given piece of evidence from a primary source helps you understand something. Look back at Table 5.2, your annotations, and the researchers’ sampling, data analysis, results, and design. Using your critical thinking (and the knowledge you can in Part 2 and Part 3 of this textbook) you will need to make a reasonable argument that these results transfer to the people, places, and culture that you are talking about in your working question.

In the final chapter of Part 1, we will discuss how to assemble the facts you have taken from journal articles into a literature review that represents what you think about the topic.

Key Takeaways

  • Begin your search by reading thorough and cohesive literature reviews. Review articles are great sources of information to get a broad perspective of your topic.
  • Don’t read an article just to say you’ve read it. Annotate and take notes so you don’t have to re-read it later.
  • Use software or paper-and-pencil approaches to write notes on articles.
  • Annotation is best used when closely reading an empirical study highly similar to your research project.

Exercises

  • Select an empirical article highly related to the study you would like to conduct.
  • Annotate the article using the aforementioned annotations and create some of your own.
  • Create the first draft of a summary table with key information from this empirical study that you would like to compare to other empirical studies you closely read.

  1. It wouldn’t make any sense to say that people’s workplace experiences cause their gender, so in this example, the question of which is the independent variable and which are the dependent variables has a pretty obvious answer.
  2. Cassidy, S. A., Dimova, R., Giguère, B., Spence, J. R., & Stanley, D. J. (2019). Failing grade: 89% of introduction-to-psychology textbooks that define or explain statistical significance do so incorrectly. Advances in Methods and Practices in Psychological Science2(3), 233-239.
  3. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: context, process, and purpose. The American Statistician, 70, p. 129-133.
  4. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: context, process, and purpose. The American Statistician, 70, p. 129-133.
  5. Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLoS biology, 13(3).
  6. Peng, R. (2015), The reproducibility crisis in science: A statistical counterattack. Significance, 12, 30–32.
  7. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: context, process, and purpose. The American Statistician, 70, p. 129-133.
  8. Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology31(4), 337-350.
  9. Bonanno, R., & Veselak, K. (2019). A matter of trust: Parents attitudes towards child mental health information sources. Advances in Social Work19(2), 397-415.
  10. McNeece, C. A., & Thyer, B. A. (2004). Evidence-based practice and social work. Journal of evidence-based social work1(1), 7-25.
definition

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Scientific Inquiry in Social Work (2nd Edition) Copyright © 2020 by Matthew DeCarlo, Cory Cummings, and Kate Agnelli is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book