This is a typical overused diagram that you always see in data science introduction. In this chart, we can see 3 different domains: computer science skills, statistical skills, and domain expertise.
Then, in the middle of it all, we place the prestigious data scientists which I think is now an oversold title which are widely abused by startups to attract talents . While this venn diagram is a good introduction to data science, it lacks the critical mindset of value proposition.
Just like any businesses who need to draft their business models to be invested by companies. Without the value add that your customers need, you cannot explain the real work of Data science. Here are few problems with the typical Data Science Venn Diagram.
In this case, I want to introduce an analogy that equips you with more information on what it takes to become a great data scientist.
By learning from a pianist.
One day, I supported a good friend of mine, who is an excellent pianist. After his piano concert, I congratulated him. We chat for a while until he asked me “What does a data scientist do?” I reflected and cooked up a metaphor to explain. And now, I would like to share with you.
I hope this understanding help you understand the core mindset of data scientists as professions.
During the Warring States Period, there was a musician named Gongming Yi, who played musical instruments well. There were a great number of people fond of listening to him play, and who respected him greatly.
One day, Gongming Yi saw a cow when he was relaxing in the countryside. He thought, “Everybody compliments my music. Why don’t I play some music for this cow?”
He played a piece of elegant quaint music for the cow, but the cow just kept grazing the grass with its head down. He played another piece of joyful music, but the cow still kept its head down to graze the grass and totally ignored him. Gongming Yi showed off all his skills, but the cow still ignored him.
He then tweaks his play to mimic a calf sound. And the cow reacts.
This is a simple and great lesson. No matter how complex is your solution. You will have no results if your stakeholders/audience do not understand. The key result for data science is to make a data driven decision making. To do that you need your key stakeholders approval and support.
This includes your application users, the users of your ML app, and even the engineers that support the launch of your app. You should not overthink and should develop simple solutions that aim for explainability rather than complexity.
For example, if the project is small, and your stakeholders require Proof of Concept to determine its continuity, do not jump at deep learning as your first bet. Instead, understand the problem, the data, and create an interactive dashboard to extract insights.
Do what is beneficial and enough for your stakeholders and not for your ego.
Similarly for me, whenever I received a new project, I always highlight the Minimum Viable Product (MVP) and venture into complex Machine Learning/Deep Learning when I have stakeholders buy in.
This means, I would start with exploratory based analysis. I classified results with threshold based rules (e.g: decision tree) to produce 80% of the impacts. Always try 20/80 pareto comparisons to confirm you are heading into the right direction.
If you ignore this, you will incur huge cost to your stakeholders and not necessary the impacts you promise in your solution. “ Netflix never used its $1 M Algorithm due to engineering costs ” . This could be your next headline.
Understand to whom you are playing your music for, that is the key to become a great data scientist
Can I learn data science without knowing statistics? Can I implement deep learning and call myself data scientist?
The simple answer is … CAN.
You can conduct regression without knowing about how regression works. You can plot and makes sense of it. You can even follow Keras tutorial to implement deep learning quickly for Kaggle competition.
After all, isn’t it how school teaches us? Cookie cutter and result production?
Imagine a piano tab
You play Winter Sonata. With piano tabs, you can impress your girlfriend as you follow key by key what the author instructs you. You don’t even need to read a musical note.
But are you a great musician?
NO. True, you can play music, but you are not going to be a great pianist. Piano tabs are training wheels. You never hear a professional cyclist who trains with a training wheel.
You can train and play any song you want. But your understanding will be shallow if you don’t know how to read musical notes. Unless you understand musical notes, you will never arrange your own piece.
Similarly, A great data scientist will iron out and sharpen their skills on these points in great efforts and details:
Similarly, can you become a great data scientist without statistics?
Can you become a data scientist by working on a project yourself?
Data Science is always about teamwork. In a choir there is an ensemble of Soprano, Tenor, Alto, and Bass. Each one cannot stand on its own, but together they produce harmony.
Similarly in Data Science, you will find engineers who launch your Proof Of Concept, product manager who looks at your analysis and make decisions, and even other data scientists who have different expertise.
As data scientists, we cannot stand on our own. We can build infrastructure and model. But we will fail if we do not respect and synchronize our work.
Similarly, for me, I am always blessed that I am always surrounded by colleagues who know more than me. They have different specialties (Yara/ Yodalog/ ML Ops) which help me when I venture into the unknown. We synchronize our work through product launches.
Like how tones form harmony, synchronized works deliver Impacts.
In summary, by learning from a pianist. you learnt more perspectives of becoming great data scientists:
Soli Deo Gloria
Vincent fights internet abuse with ML @ Google. Vincent uses advanced data analytics, machine learning, and software engineering to protect Chrome and Gmail users.
Apart from his stint at Google, Vincent is also a featured writer for Towards Data Science Medium to guide aspiring ML and data practitioners with 500k+ viewers globally.
During his free time, Vincent studies for ML Master Degree in Georgia Tech and trains for triathlons/cycling trips.