TL;DR Math: The Normal Distribution

By Thomas P.
January 06, 2023 · 4 minute read

Mathematics

The Basics

Statistics is the application of math; a way of organizing and putting data into perspective. One way we can organize data is by putting the numbers on a number line and finding the number in the middle of the line. We might call this value the average or the mean, which is calculated by dividing the sum of the numbers (or “terms”) by the number of terms. 

Sample Mean or x̄ = sum of terms/number of terms  (x̄ = sample mean, µ = population/expected mean)

Similarly, one can find how far (on average) each value on that number line is from the mean. This would be a measure of how spread out the number line is from the minimum value at the left end to the maximum value at the other. This measure is called the standard deviation (deviation means in this instance distance from the mean).

Because the calculation of standard deviation is typically quite tedious, it is normal to use a calculator to complete it.

It may not be especially obvious how these two measures are so interconnected yet also ubiquitously useful in statistics. Let me assure you, however, these two measures are fundamental to understanding the rest of statistics. 

Example:

I computed the following data set in my TI-nSpire:

In this example, we have a sample called x of four numbers - 1, 4, 5, and 6. Here are the calculations of standard deviation and mean worked out:

The Normal Distribution

The Normal Distribution is a known phenomenon (probability distribution of data) that occurs in some instances. Note that the normal distribution does not occur in all data sets; examples of where it occurs includes data about heights or weights. The probability distribution of the normal distribution as follows

Attribution: Wikimedia Commons, D Wells

  • ~68% of observations should be between one standard deviation from the mean. 
  • ~95% of the observations should be between two standard deviations from the mean.
  • ~99% of the observations should be between three standard deviations from the mean.

Z-Scores

By using the Normal distribution we can infer how much of the data is above or below the value. We can do this by using standard score (z-score), which can be used to compare data on one side of the distribution to data on the other. For example, data that is at a z-score of -1 is going to be closer to the mean than a z-score of 2.5.

Z  = (data value - mean value) / standard deviation

Or, 

To clarify, the normal distribution does not need to have a mean of 0 and a standard deviation of 1 in order for its values to be fitted into z-scores. The above formula automatically standardizes all of the values into a normal distribution with a mean of 0 and a standard deviation of 1. For example, if we have a normal distribution of heights with a mean of 1.8 meters and a standard deviation of .3 meters, we could calculate the z-scores for all of our data points and turn this…

Into this…

Applying the normal distribution

We can infer values within a normal distribution by using either the inverse normal operation or the normal cumulative distribution function (normalcdf / normcdf) operation. Normalcdf will determine how much of the data is below a certain threshold (the calculations below provide a decimal which can then be converted into a percent). Inverse Normal (InvNorm) tells you which z-score corresponds to a certain percent of data below that z-score in the normal distribution. (probability threshold). 

With normalcdf, one must input the lower threshold and the upper threshold, as well as the mean and standard deviation of the distribution in order to find the percent of data within those values. The calculator will typically spit out a decimal between 0 and 1, which can be interpreted as 0% and 100%. 

With inverse normal, one must provide a value between 0 and 1 (i.e. 0% and 100% of the data), and the mean and standard deviation of the distribution to find the value at the upper threshold that of a range that would contain that amount of data. 

Here are some example calculations of normalCdf and InvNorm. Note that these operations are hard to do without a scientific calculator.

We can visualize a couple of the above operations as the following: 

Ex: normalcdf(lower: 3, upper: pos. infinity, mean: 1, SD: 5) can be visualized as the left image. It confirms that 34.36% of the data (area) is above the value of 3 (at a z-score of .4). 

Ex: invNorm(proportion of data below=.9987, mean= 0, SD = 1) can be visualized as the right image. It confirms that 99.87% of the data is below the value of 3.011, similarly given in the calculator.

Further reading

https://www.investopedia.com/terms/n/normaldistribution.asp 

https://towardsdatascience.com/clearly-explained-normal-distributions-and-the-central-limit-theorem-8d7cc5a6052f 

https://www.stapplet.com/normal.html

Did you enjoy this article?

About The Author

Thomas is a student at Eastside High School.

More on this topic...

TL;DR Science: Numerical Analysis: The Unsung Hero of Science

In science, physical processes like chemical reactions or moving bodies are modelled using mathematics. The mathematical models used are ordinarily systems of equations that relate all the quantities being dealt with symbolically. These equations are said to hold for values of the variables that lie in a particular set like the real numbers or a subset of the real numbers. FInd out more in this week's article about numerical analysis and its importance!

TL;DR Guest Article: The Ambiguity of Time

This week's article is a continuation of our Guest Article series! Check out Aaitijhya's article on a walk through of the cosmological idea of time.

TL;DR Science: Exploration of Probability, Part 3

This week's article is a continuation of our Probability series!

Best STEM Programs for International Students

As an international student, it can be difficult to locate scientific opportunities in America. Luckily, we have compiled some of the best STEM programs open to international students.

TL;DR Science SI Units of Measure and the metric system

You may be familiar with many of the measures called the metric system, customary system, but what do you know about the inner workings of the measures used by the scientific community? Check out this week's article to learn more!