TL;DR Science: Causation vs. Correlation (Part 2)

By Shang Chen
October 08, 2020 · 3 minute read


Data Science

Two weeks ago, we looked at many examples in which two variables were correlated, but one did not cause the other. Today we'll look at the main ways researchers and scientists conclude casual relationships from their experiments.

If correlations only show an association, how do we show causality?

The main way scientists prove casual relationships is through Randomized Controlled Trials.

Randomized Controlled Trials begin by gathering a sample representative of the population you are trying to test. You want your sample to be a small 'slice' of the population you are trying to study. In research or medicine, this means choosing your sample carefully to make sure your sample represents who your research or medicine is targetting.

Next is to split the sample into two groups: the control and test groups. The control groups will get nothing or a placebo while the test group will receive the actual treatment. Treatment is not necessarily always a pill or injection. Sometimes, treatment can be as simple as who ate a good breakfast the morning before a big test or who got 8+ hours of sleep the night before. After splitting up the sample, scientists should gather data on the two groups. Ideally, the scientists cannot differentiate if they are working with the control or test group to prevent confirmation bias from happening (if you're unsure what this term means - look it up on Google or shoot us an email).

After the data from the two groups has been gathered, it's time to analyze the results. Generally, this means looking at the data between the two groups and seeing any noticeable differences. If there is a noticeable difference between our control group and test group, then we can conclude that the treatment is effective.

Take, for example, an experiment testing to see if a possible COVID-19 vaccine is effective. How would this pan out in an RCT?

First, we would gather a sample representative of the population we would want to use the vaccine on. This would need to encompass all ages, genders, and levels of fitness. We want to prove that the vaccine would be effective across all people in our population, not just a small group, i.e., healthy teenagers.

Next, we split them into the two groups and give one the vaccine while the other gets a placebo. We would then force our two groups to go out and contract COVID and see if our vaccine worked or not... Just kidding. There are obvious limitations to what we can test in an RCT, and in some cases like our COVID vaccine, it is not ethical to test the vaccine by actually exposing the two groups to the virus. Instead, they may choose a different way of measuring the vaccine's effectiveness, such as measuring the production of antibodies in the two groups. If the people who got the vaccine produced significantly more antibodies than the control group, that is a possible indication that the vaccine works.

Even then, a noticeable difference in your treatment and control groups is not necessarily enough for you to conclude if your treatment is effective. There may be factors that you overlooked influencing the results of your experiment. However, the purpose of an RCT is to try and eliminate or minimize these factors as much as possible. It turns out that even in the most controlled experiments, scientists still need to run their trials many times, measuring a variety of variables before they can conclude with certainty that their treatment is effective.


Randomized Controlled Trials are the 'gold standard' for establishing causality between two variables, but even they have their limitations. Begin by collecting a representative sample and splitting them into two groups. Apply the treatment to one and measure differences between the control and test. Identify differences between the control and test and determine if further measures/trials are necessary.

Note: sometimes RCTs are impossible due to ethical or practical concerns and researchers are tasked with finding alternative ways to prove causality.

Did you enjoy this article?

About The Author

Shang Chen is on the executive team of SciTeens and is studying Data Science and Economics at UC Berkeley. His hobbies include working out, cooking, and speedrunning video games. Feel free to reach out to him with comments, questions, and future article recommendations at 

More on this topic...