SciTeens Online: Data Science Curriculums

By Shang Chen
March 10, 2021 · 2 minute read

Hello everyone, this week’s article will be a primer of sorts for the upcoming SciTeens Online Data Science curriculums. If you haven’t heard already, SciTeens Online is a week-long data science program that gives high school students the skills necessary to conduct advanced research like data exploration techniques, plotting, statistical testing, and data sorting. For students that want to learn more about Data Science as a major, check out the TL;DR Majors post on Data Science here.

To begin our journey into data science, we’ll cover some basics like importing a dataset and creating visualizations. For a relevant dataset, we'll use open-source data about some statistics regarding COVID-19 from The Covid Tracking Project by The Atlantic. Now how do we get this dataset from the website and how might we go about analyzing this? The first step would be to read in our data and take a look at the first couple lines of it: 

We’re not going to get too in-depth here in this article, but there are a variety of different sorts of analysis we could do with this dataset. We can have pandas generate some basic statistics over these columns with the .describe() command. The describe command generates a variety of interesting statistics including the mean, median, and even standard deviations of our data. If you're wondering what other sort of commands you can call, simply google pandas documentation for a complete list of the different commands and functions available for data analysis.

Since this is still a primer, we’ll keep it simple in this article and analyze the number of positive and negative cases over time from the beginning of our data set (March 2020) to the end (Jan. 2021). Let’s clean up our data to keep only the columns we want information from and filter it for points where the dataQualityGrade was an 'A'. Now let's create a line plot of negative and positive cases for the course of our data set:

Don’t worry if you’re not completely sure about some of the commands we ran to create the graphs or filter the data. Our curriculum covers most of the basic plotting and visualization techniques you will need to do this sort of basic analysis of data for a dataset of your choosing and will give you resources to explore datasets in interesting ways.

You’ve just learned how to read in a file from a website, perform some basic exploratory analysis, and filter the data to create valuable visualizations. For more techniques and a deeper understanding of how to further break down datasets, be sure to take a look at SciTeens’ free online curriculum and check out the www.SciTeens.org website for more resources!

Did you enjoy this article?

About The Author

Shang Chen is on the executive team of SciTeens and is studying Data Science and Economics at UC Berkeley. His hobbies include working out, cooking, and playing video games. Feel free to reach out to him with comments, questions, and future article recommendations at Shang@SciTeens.org.

More on this topic...

TL;DR Science: Artificial Intelligence in Healthcare

Artificial Intelligence (AI) is revolutionizing the field of healthcare, bringing forth a new era of personalized medicine, improved diagnostics, and enhanced patient care. With its ability to analyze vast amounts of data, identify patterns, and make intelligent predictions, AI is transforming the way healthcare professionals diagnose diseases, develop treatment plans, and manage patient outcomes. Find out more in this week's article!

TLDR: Exploring the Frontier of Science: Bioinformatics and Genomic Data Analysis

In today's ever-evolving world of science, one field stands at the crossroads of biology and computer science, promising exciting discoveries and breakthroughs. Bioinformatics and genomic data analysis are captivating domains that offer an intriguing glimpse into the fusion of technology and life sciences. Check out the article this week to learn more!

TL;DR Science: Numerical Analysis: The Unsung Hero of Science

In science, physical processes like chemical reactions or moving bodies are modelled using mathematics. The mathematical models used are ordinarily systems of equations that relate all the quantities being dealt with symbolically. These equations are said to hold for values of the variables that lie in a particular set like the real numbers or a subset of the real numbers. FInd out more in this week's article about numerical analysis and its importance!

TL;DR Science - Vitamins and Minerals

Regardless of our age, we’ve always been told to eat balanced.  Fruits and vegetables form a large part of that ideal diet.  However, we don’t always think about why that is; sure, they’re healthy, but what makes them so important to our eating habits?  What about proteins – what makes lean meats or lentils the central focus of many of our plates?  The answer is pretty simple: all these foods are packed with nutrients. Read this week's article on vitamins and minerals!

TLDR Science: RNA-Seq Analysis: The Fascinating World Inside Our Cells

Check out this week's article on RNA sequence analysis!