Coursera Machine Learning Review: Octave and Gradient Descent

This is a series where I’m discussing what I’ve learned in Coursera’s machine learning course taught by Andrew Ng by Stanford University.  Why?  See Machine Learning, Nanodegrees, and Bitcoin.  I’m definitely not going into depth, but just briefly summarizing from a 10,000 foot view.

This is a continuation of week 2.

Installing Octave

I use Ubuntu 16.04.  If you want to install Octave on your OS, I’m sure there are plenty of resources out there telling you how to do that.  For me, the install was just

Granted, I later figured out that version of Octave had a bug with Coursera’s submit function for this course.  I ended up having to pull from a different repo in order to get a more recent version of Octave.  I can’t remember the exact command I used, but it was similar to the following:

Once Octave is installed, “octave-cli” from the command line will launch Octave within the terminal.

A Few Basic Octave Commands

Programming

This is certainly not exhaustive, but here are a few example commands, where I’m starting from a gnome-terminal in Ubuntu 16.04.  You’ll notice by the few errors that are in there that I’m not perfect. 🙂

The Assignment

The assignment starts with loading data from a file and then plotting it.  Most of this code has already been done, or the commands needed are specified within the assignment itself.

This kind of annoyed  me – it gives the easy stuff then leaves the math-intensive part for us to do.  I feel like I would have been a lot better prepared for this assignment if we would have been using Octave all along during our lectures.  Nevertheless…

It then asks us to program a compute cost function (solving J(theta)).  After that, it asks us to program the gradient descent function, which uses our computer cost function.  In both cases, the expected answer is given so we can validate our program.  In addition, we can submit our answers to Coursera in order to get almost immediate feedback on our progress.

Figuring out the math for this took me forever.  I tried to put what was in the assignment directly into Octave, but I got confused.  How do I represent a summation?  I knew I could represent those values as matrices in order to do them all at once, but I couldn’t figure out the right conversion.  I tried the sum function, until finally I realized a big mathematical insight…

The summation of those matrices is the same as a matrix multiplication!

In addition, the second big insight was keeping in mind the dimensions of my matrices.  I knew theta was 2×1 to begin with.  So if I need to multiple another matrix by the theta vector, it needs to be a 1xSomething matrix.  Seeing as some of the matrices were 97×1, I just need to transpose them to get them to play nice with theta.

To sum it up, these two insights got me through the lesson (assuming I used vector forms for my problems):

  1. The summation can be converted to matrix multiplication
  2. Look at the size of your matrices to determine how to convert them to the proper form to multiply them

I really appreciate the graphs the lesson gives.  The cost function is graphed so you can see that a certain value gives you a minimum… thus your hypothesis function works best when you use those values of theta.  Graphing the data makes it so much easier to understand.

Conclusion

Whereas this assignment was difficult due to trying to get my data in the right vector form, I’m hoping future assignments will be a little easier now that I understand how to manipulate the data into the right vector representation.  The programming part of this course (the implementation part rather than the theory part), as I expected, was much more exciting.  I enjoy theory, but only so far as I can experiment with it in order to learn better.  Hopefully the following weeks will continue to have programs I can use to help me understand the material.

Leave a Reply

Your email address will not be published.