Crea sito

A consistent data science study program with Matlab + Python

These are just personal notes to keep track of my data science learning track =), but I’m sharing them hoping that they’ll be useful to somebody. Suggestions are welcome.


The “philosophy” behind this list is: getting the hands on the core data science concepts by minimizing (:D) the number of languages or new tools to learn. Since I don’t have 40hrs/wk to dedicate I want to grasp the concept and leave the tools behind. Selected tools are:

  • MATLAB (which I know best)
  • Python + libraries (e.g. scikit-learn, panda, matlibplot, seaborn…)

I prefer not using Julia and or R for the time being.

Foundation stones

1. Andrew’s Ng “Machine Learning” course via coursera

Strongly suggested to anybody to grasp the basic concepts. The course is relatively simple if you already took Calculus, some probability theory, some graph theory, and you already put your hands on MATLAB. The really good thing is that you are required to implement (and therefore tho deeply think) some of the most classical/basic ML algorithm. The only thing I would’ve covered is Decision Trees, but I’m sure the professor had a good reason not to include them.

Reviews are everywhere on the web (and definitively positive!), if you’re not convinced yet.

Selected additional useful resources found on the web

I did the course but I won’t share the solutions/scripts, since I think it is useless and not fair.

2. Sebastian Raschka’s book: “Python Machine Learning”

Sebastian is PhD candidate in computational neurobiology. I choose his books because of the really good reviews it received and because it did not only presented “recipes” to be used with scikit-learn but want you to grasp the concepts behind the algorithm, either by simplified implementations (e.g. chapter 1) or by meaningful examples.

I think you can’t appreciate this book if you are completely new to ML. I am reading this and found quite pleasant, is well-written and has a good balance between intuition and application. Definitively a good way to start with scitkit-learn:


P.S As mentioned in previous posts I firstly started by a “Titanic Kaggle competition tutorial”, but I really don’t like to make use of algorithm I don’t understand, that’s why I put this book first.

3. [update] Probabilistic graphical models (Stanford, Coursera)

Reviews (1 | 2 | coursetalk) for this are extremely good. 10-20hrs a week depending on your background and on the programming assigments you want to carry out.

4. Next… to be decided

I really like to figure developments, so I will place here some resources for the future. These are just personal notes and I coul’ve not verified the quality, yet.

Edit: probably I will continue with Element of statistical learning

MOOC Courses: A selection of courses to check out
  • 13 Weeks data science course from Harvard
  • Hardvard’s online course in data science (reviewed to be hard)
  • Finished Andrew’s NG course: where to go now? A real Caltech course (not a watered-down version).
    “Learning from data”, 18hrs series video lecture recorded at Caltech. Estimated time: 10 weeks.
    Pros: includes homeworks and key solutions.
    Consider: seems to be mostly theorethical
    Seems language-agnostic
  • A real course at Carnegie Mellon
  • for Deep Learning on Deep Learning with TensorFlow
  • For Probabilistic Graphical Models: Reviewed to be extremely good, homeworks in MATLAB
  • [NOT FREE]: Datacamp courses. 25$/month. Seem pretty good, maybe when I’ll have free time to spare.
Problems and competitions
Learning path “philosophy” and questions:
Selected papers:
Apologies, for this post the comments are closed.