This blog is for people who have just started with machine learning concepts and are confused how to proceed .

First you need to be trained with the appropriate basis in mathematics and computer science. Then you need to read on machine learning (there are several good books, such as Chris Bishop’s and Kevin Murphy’s, online videos such as Andrew Ng’s coursera’s class and Hugo Larochelle’s videos on neural networks, and you can get a summary of many of the basic issues in chapter 5 of the Deep Learning book).

The coursera machine learning course by andrew ng (coursera|machine learning) is a good starting point . This course has been taught using octave/ matlab.

Then you need to start practicing i.e. programming some learning algorithms yourself and playing with them on data, try to compete in some Kaggle competitions for example. Choose a programming language (R/ Python / Julia ) . I started coding my models in Python as it is a little easier than R and has a steep learning curve.

The Python libraries needed are :

  1. numpy (it makes the mathematical calculations very easy)
  2. pandas (to work on excel sheets (eg .csv))
  3. matplot / seaborn (for the visualization of data)
  4. scikit (for the machine learning algorithms libraries).

Datacamp offers a variety of online courses and video tutorials to learn btoh R and python for data science. Python programming has a very good collection of video lectures to learn these python libraries.

Now you are all set to start with the first problem . I would suggest starting with titanic problem on kaggle as it has very comprehensive tutorials and an abundance of scripts available to help build a good solution.

Analytics vidya | kaggle — I would suggest you to refer to this blog before choosing your first problems .


Hi I am Ankita. I work at Intuit India. I am passionate about machine learning and artificial intelligence.