Data Science is booming. Not only it’s fun to do, but it has amazing applications in any domain or industry. With all my professional activities I try to promote Data Science as much as I can, in a way which would be accessible to all. I also try to fill in knowledge gaps, in particular in business applications of data science. That’s how arrived at the idea of creating a Data Science Crash Course .
So how do you go about creating your own course? It wasn’t that hard for me. As a PhD in math, I had many chances to create university-level courses for my students. But I wanted something different this time — quick, focused, efficient. That’s why I’ve decided to build this course.
Its goal is to make you comfortable with fundamental concepts in Data Science about which you’ve probably heard already many times: supervised learning, unsupervised learning, Anaconda, Python, pandas, json, NumPy, scikit-learn, etc. I explain this in this intro:
From there it only gets better.
The most basic tool for Data Science is Python, and by installing a framework Anaconda for it, you’ll be set to start some experiments.
The next step is to revise some linear algebra and statistics. You need to be able to operate on vectors and matrices, know what’s a mean, variance and bias, and be able to compute some probabilities.
After those preparations, we can get into the main part. The first step is processing data: importing data from jsons, xmls, spreadsheets and texts files. Once you import it you have to store it: NumPy arrays and DataFrames in pandas works well.
Now it’s time to get some actual data to do experiments on it. There are couple of sources you might want to use :
You’re good to go!
Now it’s time to get to algorithms. You have data, you know how to process it, let’s do something fun with it. I talk about differences between supervised and unsupervised learning, so classification vs clustering problems, or yet in other words: a situation when your dataset has labels or not.
There are many standard techniques in both of those situations and I cover them in the videos.
The final part of doing Data Science is communicating your results to others. The only way to do it is by visualizing: either plotting them on a graph or putting a report together about it. The best methods for that is using matplotlib and Dash, and then of course putting your code on GitHub.
And that concludes our 60 minutes course!
If you followed it through and watched all the videos, you should be ready to tackle some real life problems with Data Science.