These are just personal notes to keep track of my data science learning track =), but I’m sharing them hoping that they’ll be useful to somebody. Suggestions are welcome.
Intro
The “philosophy” behind this list is: getting the hands on the core data science concepts by minimizing (:D) the number of languages or new tools to learn. Since I don’t have 40hrs/wk to dedicate I want to grasp the concept and leave the tools behind. Selected tools are:
- MATLAB (which I know best)
- Python + libraries (e.g. scikit-learn, panda, matlibplot, seaborn…)
I prefer not using Julia and or R for the time being.
Foundation stones
1. Andrew’s Ng “Machine Learning” course via coursera
Strongly suggested to anybody to grasp the basic concepts. The course is relatively simple if you already took Calculus, some probability theory, some graph theory, and you already put your hands on MATLAB. The really good thing is that you are required to implement (and therefore tho deeply think) some of the most classical/basic ML algorithm. The only thing I would’ve covered is Decision Trees, but I’m sure the professor had a good reason not to include them.
Reviews are everywhere on the web (and definitively positive!), if you’re not convinced yet.
Selected additional useful resources found on the web
- python code for Andrew’s NG machine learning class (original version is develoepd in MATLAB/Octave)
- official course slides (with annotations, really useful as refresher)
- Some student’s notes.
I did the course but I won’t share the solutions/scripts, since I think it is useless and not fair.
2. Sebastian Raschka’s book: “Python Machine Learning”
Sebastian is PhD candidate in computational neurobiology. I choose his books because of the really good reviews it received and because it did not only presented “recipes” to be used with scikit-learn but want you to grasp the concepts behind the algorithm, either by simplified implementations (e.g. chapter 1) or by meaningful examples.
I think you can’t appreciate this book if you are completely new to ML. I am reading this and found quite pleasant, is well-written and has a good balance between intuition and application. Definitively a good way to start with scitkit-learn:
Resources:
- Book presentation by the Author
P.S As mentioned in previous posts I firstly started by a “Titanic Kaggle competition tutorial”, but I really don’t like to make use of algorithm I don’t understand, that’s why I put this book first.
3. [update] Probabilistic graphical models (Stanford, Coursera)
Reviews (1 | 2 | coursetalk) for this are extremely good. 10-20hrs a week depending on your background and on the programming assigments you want to carry out.
- What we learned from online education, TED talk by Daphne Koller (course instructor)
4. Next… to be decided
I really like to figure developments, so I will place here some resources for the future. These are just personal notes and I coul’ve not verified the quality, yet.
Edit: probably I will continue with Element of statistical learning
MOOC Courses: A selection of courses to check out
- 13 Weeks data science course from Harvard
- Hardvard’s online course in data science (reviewed to be hard)
- Finished Andrew’s NG course: where to go now? A real Caltech course (not a watered-down version).
“Learning from data”, 18hrs series video lecture recorded at Caltech. Estimated time: 10 weeks.
Pros: includes homeworks and key solutions.
Consider: seems to be mostly theorethical
Seems language-agnostic - A real course at Carnegie Mellon
- for Deep Learning https://classroom.udacity.com/courses/ud730 on Deep Learning with TensorFlow
- For Probabilistic Graphical Models: https://class.coursera.org/pgm. Reviewed to be extremely good, homeworks in MATLAB
- [NOT FREE]: Datacamp courses. 25$/month. Seem pretty good, maybe when I’ll have free time to spare.
Books:
- Element of statistical learning (book) / An introduction to statistical learning. The latter being a simplified version
There is also a solution manual available, extremely helpful for self-study
All the exercises and examples presented are written in R. Fortunately someone providedIPython notebooks, for instance here or here.
Problems and competitions
- Kaggle: the home of data science (or: JOIN A COMPETITION!
- Quora: good toy problems in data science
- https://www.crowdanalytix.com/ : one data anlytics challenge solved per week
- Plese reccomend books/websites to attain skills neededto take part in competitions
- Titanic: Machine Learning from disaster (tutorial+project)
- Kaggle competitions Solutions
- a Trello collection on data science
Learning path “philosophy” and questions:
- Review: My education in Machine Learning via Coursera, a review so far
- Suggested online curriculum for data science
- Review of top 10 data science course
- How do I learn machine learning?
- Beginner’s machine learning resources
- Why becoming a data scientist is hard
- How can a begineer train for ML contest?
- http://www.kdnuggets.com/2015/09/top-20-data-science-moocs.html