Building Up Our Data Scientists: From Learner to Full License

We don’t hire data scientists, we build them.

At Data Science Dojo, we invest in up-and-coming data scientists who have the potential to grow into master Jedi with the breadth and depth to work on almost any data problem. We often find people who have solid knowledge of data science but lack the real-world experience to know how to tackle complex problems that go beyond the textbook.

That’s why we are launching a new program to coach up-and-coming data scientists and create masters of the field. Instead of forever searching for the ‘ data scientist unicorn’, we decided to work with great potential. Investing in great potential now can lead to greater outcomes in the future.

To be considered worthy of free training, exposure to real-world data science problems, and in-person mentoring, while getting paid, you need to have the following attributes:

  • Solid understanding of math/stats, including machine learning concepts and key algorithms, probability, data distributions, linear regression, statistical inference, hypothesis testing, and confidence intervals.
  • Solid coding skills, including the ability to adhere to best practices, formats and presentation.
  • Strong written and verbal communication skills, including the ability to write technical content, communicate insights, and present in front of an audience.
  • Ability to hack away at different APIs, tools and functions.
  • Ability to wrangle data to prepare it for analysis.
  • Ability to handle ambiguity in project requirements.

Getting Started: Become a data science trainee and learn some real-world data science skills

The critical mass of skills required to be an effective data scientist is non-trivial. Motivated young professionals attend courses after courses only to realize that real-world problems are different from the online courses.We have faced this problem while hiring and accepted that ‘real-world data scientists’ are a hard to find commodity. Even if you have finished 100 MOOC courses on data science, we will ask you to put those skills to practice during your first six months at Data Science Dojo.As our data scientist trainee, you will just learn at our expense. You will be ramping up on different aspects of becoming a great data scientist:

  1. Variety of datasets: A wide variety of datasets are available in public domain for data scientists to practice their skills. You will be exposed to some of the available datasets in an increasing order of complexity. You can find a list of the existing datasets here. [Add link here]
  2. Tools and SDKs: Which one is better: R or Python? Should I use AWS, Azure or GCP? How about SageMaker, TensorFlow and Cafe2? You will get working knowledge of most of these tools while building models or gathering actionable insights on a variety of different datasets.
  3. Machine Learning and Modelling Chops: Data science is not just about model building. In fact, real-world data science problems follow what we call the 80/20 rule. For any non-trivial, real-world problem, you will end up spending more than 80% time in acquiring, cleaning, processing, storing data and extract meaningful features out of this. Only after that you will be able to gather any actionable insights out of data. Under the supervision of seasoned data scientists, you will be asked to take on variety of tasks in the data science lifecycle. You will learn the following skills on datasets of varying difficulties.
  • Data Exploration, Visualization and Feature Engineering: Can you slice and dice data so that it makes sense? You will be learning some of the common data exploration, visualization packages. You will be learning different techniques of feature engineering on a variety of data sets.
  • Machine Learning: Anyone can build a model that can differentiate between Cats and Dogs using an off the shelf library but can you build a model. You will have a solid understanding of when to use supervised vs. unsupervised learning or how are ranking and regression related.
  • Size of datasets:

Techniques: No two problems are the same. You will be working on a variety of datasets with problems ranging from classification to clustering, regression to ranking and outlier detection to dimensionality reduction. Whether Parametric or non-parametric; discriminative or generative; algebraic or probabilistic - all techniques are fair game because in real-world you need a whatever it takes and can-do mindset.

Progress to Associate Data Scientist: Step out of the bubble and into the real-world

Real-world problems are not like off-the-shelf data-sets. There are many aspects of the data that are not known. Many times, you need to think about what data is needed and hunt until you find it. You’ll be working on a variety of real-world problems using structured and unstructured data sources.

1. Critical Thinking and Presentation:

We’ve often seen that business leaders and data scientists don’t see eye to eye on a given scenario. Bridging the gap between both has been the most difficult aspect in providing a solution. As an associate data scientist, you will assist experienced data scientists on the team and learn to bridge the gap between the two. You will learn what to expect from a customer and how to present the solution at its best expectation while being ethical.

  • The ‘good enough’ answer is never the right answer, think curiously about the problem and develop your intuition
  • Consider the credibility of your data, not all sources are dependent
  • Get ready to embrace complex situations, data security might be a concern, you might get encrypted data, or you may need to synthesize data due to data privacy policies.
  • Trick-or-treat, we often don’t get to take all the candy home. Trade-offs are a necessary evil, you will learn to make balanced decisions.

2. Teamwork and Collaboration

Building a data science solution is not a one-man job. You will be working with a team of data scientists to develop modeling/data strategies. Essentially, collaborating with your team members to develop use cases with the business goal in mind, using collaboration and tracking tools like ‘git’ for source control and Jira for task management to setup a healthy workflow for team contributions. Working in a team setting is crucial to develop a viable solution that runs on data with measurable impact. You’ll learn to implement machine learning algorithms at scale and configure end-to-end data pipelines on cloud services like Azure, AWS, and GCP.

Additionally, as a data scientist in practice, you’ll be passing along your learning to early data science trainees ramping them up for real-world problems and mentoring our bootcamp attendees through interesting Kaggle competitions. Over a few months you will evolve into an exceptional data scientist with a holistic understanding of real-world applications of machine learning and predictive analysis.

Being a data scientist is about anticipating and solving problems while remaining in ethical boundaries, the journey from trainee to a full stack data scientist will transform you into a thought leader. You’ll be sworn in by reading the Hippocratic oath of a data scientist which help guide your decision making.

[Hippocratic Oath Image]

You will focus towards the bigger picture of a business problem while being aware of the uncertainties in your data. The key aspects of the role are charting out project strategy, managing team members, providing recommendations on data modeling and extracting valuable insights from data. We’ll work on your individual growth as a leader and improve your ability to foster organizational engagement.

Data Science Dojo is a unique workplace that invests heavily in employees and growth. Being one of the foremost global data science companies, we ensure that our clients and customers get the best services and products. All employees hired for a data science role go through a rigorous training plan that helps them improve their written and visual communication skills, technical skills, and project management skills. All these skills are critical for your success as a data scientist.


This is a companion discussion topic for the original entry at https://blog.datasciencedojo.com/p/2fb4ac61-1db8-495e-8d8d-7488f428e693/