This is a series where I’m discussing what I’ve learned in Coursera’s machine learning course taught by Andrew Ng by Stanford University. Why? See Machine Learning, Nanodegrees, and Bitcoin. I’m definitely not going into depth, but just briefly summarizing from a 10,000 foot view.
This is a continuation of week 2.
I use Ubuntu 16.04. If you want to install Octave on your OS, I’m sure there are plenty of resources out there telling you how to do that. For me, the install was just
sudo apt install octave
Granted, I later figured out that version of Octave had a bug with Coursera’s submit function for this course. I ended up having to pull from a different repo in order to get a more recent version of Octave. I can’t remember the exact command I used, but it was similar to the following:
sudo apt-add-repository -y ppa:picaso/octave
sudo apt-get update
sudo apt-get install octave
sudo apt-get install liboctave-dev
Once Octave is installed, “octave-cli” from the command line will launch Octave within the terminal.
A Few Basic Octave Commands
This is certainly not exhaustive, but here are a few example commands, where I’m starting from a gnome-terminal in Ubuntu 16.04. You’ll notice by the few errors that are in there that I’m not perfect. 🙂
GNU Octave, version 4.2.1
Copyright (C) 2017 John W. Eaton and others.
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. For details, type 'warranty'.
Octave was configured for "x86_64-pc-linux-gnu".
Additional information about Octave is available at http://www.octave.org.
Please contribute if you find this software useful.
For more information, visit http://www.octave.org/get-involved.html
Read http://www.octave.org/bugs.html to learn how to submit bug reports.
For information about changes from previous versions, type 'news'.
octave:1> % Define a 2x3 matrix
octave:1> x = [1 2 3; 4 5 6]
1 2 3
4 5 6
octave:2> % Transpose matrix
octave:3> % Inverse of matrix
error: inverse: A must be a square matrix
octave:3> inv([1 2; 3 4])
octave:4> % Multiply two matrices
octave:4> x * [1 2; 3 4]
error: operator *: nonconformant arguments (op1 is 2x3, op2 is 2x2)
octave:4> [1 2; 3 4] * x
9 12 15
19 26 33
The assignment starts with loading data from a file and then plotting it. Most of this code has already been done, or the commands needed are specified within the assignment itself.
This kind of annoyed me – it gives the easy stuff then leaves the math-intensive part for us to do. I feel like I would have been a lot better prepared for this assignment if we would have been using Octave all along during our lectures. Nevertheless…
It then asks us to program a compute cost function (solving J(theta)). After that, it asks us to program the gradient descent function, which uses our computer cost function. In both cases, the expected answer is given so we can validate our program. In addition, we can submit our answers to Coursera in order to get almost immediate feedback on our progress.
Figuring out the math for this took me forever. I tried to put what was in the assignment directly into Octave, but I got confused. How do I represent a summation? I knew I could represent those values as matrices in order to do them all at once, but I couldn’t figure out the right conversion. I tried the sum function, until finally I realized a big mathematical insight…
The summation of those matrices is the same as a matrix multiplication!
In addition, the second big insight was keeping in mind the dimensions of my matrices. I knew theta was 2×1 to begin with. So if I need to multiple another matrix by the theta vector, it needs to be a 1xSomething matrix. Seeing as some of the matrices were 97×1, I just need to transpose them to get them to play nice with theta.
To sum it up, these two insights got me through the lesson (assuming I used vector forms for my problems):
- The summation can be converted to matrix multiplication
- Look at the size of your matrices to determine how to convert them to the proper form to multiply them
I really appreciate the graphs the lesson gives. The cost function is graphed so you can see that a certain value gives you a minimum… thus your hypothesis function works best when you use those values of theta. Graphing the data makes it so much easier to understand.
Whereas this assignment was difficult due to trying to get my data in the right vector form, I’m hoping future assignments will be a little easier now that I understand how to manipulate the data into the right vector representation. The programming part of this course (the implementation part rather than the theory part), as I expected, was much more exciting. I enjoy theory, but only so far as I can experiment with it in order to learn better. Hopefully the following weeks will continue to have programs I can use to help me understand the material.