From time to time, I am being asked “how does one get started with Data Science?”. To answer this, I wrote this post.
I am here to bring you good news. If you have studied math at the level of the second year of a technical university and you know how to code, you already have a very solid background to get started. It is the best if you know python as it is the default language of Data Science these days. If you don’t know how to code, start from learning coding in python (choose python 3 not 2.7) and then come back here.
Teach thyself
Here is a self-study plan:
- Study Open Data Science (ODS) course at https://mlcourse.ai/, and, if you speak Russian, make sure to join their slack channel http://ods.ai, a vibrant Russian-speaking Data Science community
- Read https://shapeofdata.wordpress.com, a very nice blog for developing geometric intuition for Machine Learning algorithms
- In parallel, repeat basic math concepts “on demand”. You will need some basic Linear Algebra, Multivariable calculus, Probability, and Statistics
- In parallel, Apply your knowledge to many practical problems. Practice will make it perfect. Code your solutions and strive to improve your coding (you can also read about “Clean code”, “Design Patterns”, “Algorithms and data structures”). Have a fun weekend project, participate in kaggle competitions. Read about solutions to past competitions http://ndres.me/kaggle-past-solutions/
- In parallel, read popular Data Science / Machine Learning blogs
- For statistical perspective on ML, study mathematicalmonk
- Read Domingos, P., 2012. A few useful things to know about machine learning
- Optional: if you want a quick overview of ML theory https://mostafa-samir.github.io/ml-theory-pt1/. If you want to go deeper, please check theory books or watch Understanding Machine Learning by Shai Ben-David
Below is the list of references to help you navigate Data Science landscape.
References
Python / Programming
- https://developers.google.com/edu/python/
- https://www.python.org/dev/peps/pep-0008/
- https://learnxinyminutes.com/ is a very good way to get started with a new programming language
Lectures / online courses
- https://mlcourse.ai/ is a great course to get started with Data Science. It is made by Open Data Science Community, and its original Russian version is also available https://habr.com/company/ods/blog/322626/
- mathematicalmonk is my favorite online course on Machine Learning seen from a statistician perspective. He also has nice series on probability
- Artificial Intelligence from MIT opencourseware
- Computational Statistics and Statistical Computing
- http://cs229.stanford.edu/ ML course from Stanford
- http://cs231n.stanford.edu/ CNN course from Stanford
- http://cs224d.stanford.edu/ NLP course from Stanford
- https://developers.google.com/machine-learning/crash-course/, an ML crash course from Google
- https://developers.google.com/machine-learning/guides/rules-of-ml/, Google’s ML engineering good practices
- Understanding Machine Learning Course by Shai Ben-David
- Machine Learning lectures by Pedro Domingos
Blogs / websites
- https://www.reddit.com/r/MachineLearning/
- https://www.kdnuggets.com/news/index.html
- http://fastml.com
- https://mlwave.com
- http://www.wildml.com/
- http://blog.datadive.net/
- https://explained.ai/
- http://hunch.net
- http://blog.kaggle.com/
- https://ai.googleblog.com/
- http://arogozhnikov.github.io/
- https://shapeofdata.wordpress.com is a very nice blog (actually, it should be a book) for developing geometric intuition for Machine Learning algorithms
- http://colah.github.io the blog of C. Olah, it has some nice explanations
https://distill.pub gives clear explanations of interesting ML research
- https://twiecki.github.io/
https://blog.acolyer.org/, a blog with interesting Computer Science papers
- https://www.data-to-viz.com/, what visualization to choose for your data?
- https://ohshitgit.com/, got a problem with git?
- https://visualgo.net/en gives visual explanations of algorithms and data structures
- https://www.geeksforgeeks.org/, a Computer Science website for geeks
https://metacademy.org/ is a package manager for knowledge
- http://explorabl.es/, fun visual explanations to take a break from your Data Science studies
- http://www.cseblog.com/ keeps your brain fit
Math courses at http://brilliant.org/
Papers
Books
Preliminaries
- Boyd, Stephen. 2004. Convex Optimization
- Rozanov, Yurii A. 2013. Probability theory: a concise course
Nesterov, Yurii. 2013. Introductory lectures on convex optimization: A basic course
- Computer Science Theory for the Information Age
- Skiena, Steven S. 1998. The algorithm design manual
Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to algorithms
Martin, Robert C. 2009. Clean code: a handbook of agile software craftsmanship
Machine Learning
- Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of Statistical Learning
- Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective
- Interpretable Machine Learning
Neural Networks
- Neural Networks and Deep Learning, a nice getting started book on Neural Networks
- Goodfellow, Ian, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning
More theoretical Machine Learning
- Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar. 2012. Foundations of Machine Learning
- Shalev-Shwartz, Shai, and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms
Stats
- Wasserman, Larry. 2013. All of Statistics: A Concise Course in Statistical Inference
- Casella, George, and Roger L Berger. 2002. Statistical Inference