Data Science interviews test many technical skills depending on the role or the level.
Algorithmic Coding Prep
- This is probably the most important component for software engineering roles, but not so much for data scientists. If you are interested in roles like Research Scientist, Research Data Scientist, then this will become important.
- One mistake people do here is spend too much time worrying about this and too much time to get good at this.
- Most interviews for data scientist roles will go into Arrays, but not so much into advanced concepts like binary trees. So spend most time in Easy and Medium Array questions.
- All array questions could be categorized into 6-7 building blocks (eg: Two sum problems, Binary Search problems, Reverse traversal problems). If you are comfortable with those building blocks, you could code any array question by relating it to a building block and modifying the boiler plate code of that building block. Same goes for other concepts. This is the best way to prepare.
- Best place to learn building blocks is https://interviewcamp.io/
- Practice questions at https://leetcode.com/. Mostly go for easy and medium level array questions.
- A very good subset of coding questions https://www.teamblind.com/post/New-Year-Gift---Curated-List-of-Top-75-LeetCode-Questions-to-Save-Your-Time-OaM1orEU
- You cannot afford to bomb SQL rounds. So do well here.
- https://mode.com/sql-tutorial/introduction-to-sql/ is the best place to brush up on SQL skills. Its concise and to the point
- https://selectstarsql.com/ seems like a great place to learn SQL.
- https://leetcode.com/problemset/all/?listId=5htp6xyg&page=1 has good practice questions.
- My mode SQL notes
- This round is about how you could manipulate data using pandas in python or similar libraries in R.
- The only way to prepare for this round is by practicing as much as you can. No amount of reading will help here.
- Practice EDA on many publicly available data sets (changing column types, treating missing values, feature selection, engineering, outliers) etc.
- Practice coding famous algorithms like K-Means, Linear Regression, Decision trees, etc., from scratch.
- https://www.kaggle.com/c/titanic for titanic data and example EDA’s
- One type of interview question is a case study where you would be given a use case and you need to walk them through what data you would need, what target variable you would choose, modeling procedure, evaluation metrics.
- The best way to prepare for this would be to do some mock interviews.
- Another type of question would ne just asking ML fundamentals like what is the difference between Lasso and Ridge regression and slowly going on from there by asking more and more questions.
- In general you need to be familiar with how most ML models work and what are the advantages and disadvantages. This applies to all ML concepts.
- Some good cheat sheets: https://sites.google.com/view/datascience-cheat-sheets#h.h40dwqqwv30w
- Chris Albon’s flash cards are very useful: https://machinelearningflashcards.com/
- Other online source: https://towardsdatascience.com/the-data-science-interview-blueprint-75d69c92516c
Best way to reach me is via Twitter or LinkedIn DM’s
If you want to get updates on the latest blog posts, please subscribe here. I promise I will not share your email-id anywhere :)