Full Stack Data Science: The Next Generation of Data Scientists Cohort
This blog post goes over what it takes to become a Machine Learning Data Scientist to keep up with the changing demand of the industry.
Data science has been an eye-catching field for many years now to young individuals having formal education with a bachelors, masters or Ph.D. in computer science, statistics, business analytics, engineering management, physics, maths, or obviously data science. However, there are a lot of myths that people presume about data science. It’s no more just machine learning and statistics. Over the years, I have spoken to a lot of data science aspirants about breaking into this field. Why is there all the hype about data science? Is it still statistics and machine learning that can help you break into this field? Is it still going to be the future? Even I was in the same boat as you all, but I am now experiencing how the demand has molded currently for the next generation of data scientists breaking into this field. I am not going to teach you how to get into data science as many people on the internet are already doing it.
Why is there all the hype about Data Science?
Everyone around the corner wants to get into data science. A few years ago, there was a demand-supply problem in the field: supply of data scientists was less, and demand was more after Dr. DJ Patil and Jeff Hammerbacher tossed the term Data Science. But now, in 2022, the situation has turned around. The inflow of formally/MOOCs educated data science enthusiasts has increased, and the demand has grown too, but not to that extent. The term has evolved broader and broader to incorporate most of the supporting functionalities that one needs to do data science.
- The mystery behind the title data scientist
- High job satisfaction
- Huge business impact
- Many job sites rating it as the hottest Job (last 3 years as hottest Job in the US by Glassdoor)
- Cutting edge developments
- Increasing influx of data generation
- Thanks to many great/not so great schools and boot camps providing degrees in data science
- data is beautiful!
People who call themselves Data Scientists?
Someone is going to say it, so let me spill some truth about the current industry situation. Due to increase in demand and prestige of the shiny Data Scientist title, Many companies have started switching data scientist titles with product analyst, business intelligence analyst, business analyst, supply chain analyst, data analyst, and statistician because people were leaving their jobs to get the data scientist titles at companies which were giving them for doing the same job. It’s all the matter of respect that many roles get due to this minor change in the words. So, companies have started twisting titles, in the same way, to make it more shiny and desirable like data scientist-analytics, product data scientist, data scientist-growth, data scientist-supply chain, data scientist-visualization, or data scientist - what not?.
For people who want to do applied machine learning as a Data Scientist in 2022 without a Ph.D., there’s a lot more to it now instead of just knowing to apply machine learning to datasets which almost anyone today can do. There are a few other crucial things which I figured out from my experience, which can help you nail the data scientist role hunting for the interview process or even to get shortlisted:
- Distributed Data Processing/Machine Learning: Getting hold of hands-on experience with technologies such as Apache Spark, Apache Hadoop, Dask, etc. can help you prove that you can create Data/ML pipelines at scale. Having experience with anyone of them should be good to go, but I would recommend Apache Spark(either in Python or Scala) the go-to.
- Production ML/Data Pipelines: If you can get hands-on experience with Apache Airflow, a standard open-source job orchestration tool for creating data and machine learning pipelines. This is currently used in the industry so, it’s recommended to learn and get some projects around it.
- DevOps/Cloud: DevOps is very much neglected by most of the data science aspirants. If you don’t have an infrastructure, how would you build ML pipelines? It’s not as easy as we do in the coursework to build notebooks or code that run on your local machine. The code that you write should be scalable across infrastructure that you or other folks might create on your team. Many companies might not have the ML infrastructure already laid out and might be looking for someone to start with. Getting familiar with Docker, Kubernetes, and building ML applications with frameworks like Flask should be your standard practice even during your coursework. I love Docker as it’s scalable and you can build infrastructure images and replicate the same things on servers/cloud on Kubernetes clusters.
- Databases: Knowing databases and query languages is a must. SQL is very much neglected, but It’s still the industry standard, be it on any cloud platform or databases. Start practicing complex SQLs on leetcode, which is gonna help you with some part of coding interviews in DS profiles as you will be responsible for bringing in data from warehouses with on-the-go preprocessing, which will ease up your job on preprocessing before running ML models. Most of the feature engineering can be done on-the-go while getting the data to your models with SQL, which is an aspect many people neglect.
- Programming Languages: The recommended programming languages for data science are Python, R, Scala, and Java. Knowing anyone of them is fine and can do the trick. For ML kind of roles, there’s going to be live coding rounds in the interview process so you need to practice wherever you are comfortable — Leetcode, Hackerrank, or anything you prefer.
So, This is the time when knowing only Machine Learning or Statistics is not gonna get you into data science to do ML unless you are lucky, have some great connections in the industry(you should obviously do networking which is very important!) or have an exceptional research record already in your name. Business applications and domain knowledge tends to come with experience and can’t be learned beforehand other than doing internships in relevant industries.
Comments
Post a Comment