TL;DR Majors: Data Science
By Shang Chen
September 09, 2020 · 3 minute read
Computer Science
Statistics
Data Science is about pretty much exactly what you think. The science of data. Now, this breaks down into many different subfields and even more specializations. Still, this article will focus on some of the general themes across all data science disciplines and work to give you some insight on if Data Science would be a good fit for you.
What is Data Science?
To give a formal definition, Data Science is “learning about the world from computation made from data.” It’s no secret that computers have advanced immensely in the last half-century. Data Science seeks to fully utilize the computing power of technology to gather, synthesize, and interpret data sets far more extensive than we could ever imagine. While the first computers could take up whole office spaces to complete basic arithmetic, super-computers and artificial intelligence programs allow for the useful analysis of data sets of immeasurable size.
One concrete example of how data science impacts our lives is in the area of buying and selling goods online (think eBay, Amazon, etc.). I’m sure you’ve occasionally come across an ad for a purse, shoe, or video game you were browsing a couple of days ago and wondered, is this a coincidence? The chances are that it isn’t, and companies of all sizes are quickly shelling out millions of dollars to try and understand the habits and tastes of their consumers better. Data Scientists assist in this collection and analysis of data from millions of search engine results per hour to help companies like Instagram and Facebook personalize even the advertisements you see as you scroll your feed.
While there might not be a ‘Data Science Class’ at your high school, the concepts that Data Scientists rely on are rooted in many of the more widely available STEM classes around the world. The two most important courses I would recommend for aspiring Data Science majors are Statistics and Computer Science. Data Scientists use concepts from both of these fields daily and often at the same time. Statistical analysis is the backbone of the theory behind organizing and collecting good sets of data, while programming skills are necessary to sift through the data effectively. By getting a head start on these skills in high school, you will be able to go into your first Data Science classes confidently and familiar with many of the processes and techniques Data Scientists use.
What are the steps of the Data Science Method?
The general steps Data Scientists use are as follows:
- Explore - Figure out something that you want to answer or solve using data. This can be something as simple as how many squirrels are living in Central Park to how we can try and pinpoint areas that may be at risk for COVID-19 outbreaks. After Data Scientists have decided something they want to know, they need to collect and visualize this data. Frequently you’ll see visualizations in the form of graphs or charts.
- Inference - Once data scientists have collected data and generated models, the next step is to look at the data and try and determine what conclusions can be drawn from what we know. This is the step that lies at the heart of any good data analysis. As crucial as interpreting data is, it’s also vital for data scientists to be able to identify areas where bias may occur. Biases in datasets can occur in many different ways, and they range from random errors in measurements (sometimes these random errors will be referred to as ‘noise’ in the data), and sometimes the bias is more systematic. Systematic biases are dangerous because they can completely skew the data one way or another.
- Prediction - After the conclusions from the data have been drawn; it’s time to remember why we bothered gathering the data in the first place - to solve a problem or answer a question. After the inference phase, It’s time to apply what we learned to similar issues or start expanding the scope of the research. If your data was initially focused on how to prevent COVID-19 outbreaks, the next step might be looking towards tweaking the model to prevent future infectious diseases. Going back to our example about calculating how many squirrels live in Central Park, perhaps a natural application of your methodology would be to help track the population of an endangered species in a larger area.
Data Science is a truly impactful field that opens up a world of opportunities for you. You can decide to specialize in almost any field, be it STEM or otherwise. Data Scientists can choose to enter research institutions and focus on crunching vast amounts of data to predict significant events like who will win the next presidential election. Other Data Scientists may decide to enter the finance industry and take on the challenges of trying to model and predict the stock market. The opportunities are endless.
TLDR;
Data Science is a field that combines the strengths of Computer Science and Statistics to try and solve problems in the real world with computation. It has a wide range of applications, from STEM fields to marketing to environmental work. Students considering going into Data Science should be comfortable knowing they apply the techniques of Data Science to almost any field of study.
Sources:
https://ischoolonline.berkeley.edu/data-science/what-is-data-science/
https://www.innoarchitech.com/blog/what-is-data-science-does-data-scientist-do
Did you enjoy this article?
About The Author
Shang Chen is on the executive team of SciTeens and is studying Data Science and Economics at UC Berkeley. His hobbies include working out, cooking, and being bad at chess. Feel free to reach out to him with comments, questions, and future article recommendations at Shang@SciTeens.org.