This is a series where I’m discussing what I’ve learned in Coursera’s machine learning course taught by Andrew Ng by Stanford University. Why? See Machine Learning, Nanodegrees, and Bitcoin. I’m definitely not going into depth, but just briefly summarizing from a 10,000 foot view.

This is a continuation of the review for week one.

## Why Review Linear Algebra?

I was a little interested in why we would be covering linear algebra in this course, but I think I figured it out. I believe there are two reasons:

- Linear algebra allows us to solve for several functions all at once. Remember how we have a dataset that can help use determine a hypothesis function? Linear algebra allows us to use that entire dataset in one problem rather than solving for it multiple times.
- Using these matrices goes well with Matlab/Octave/other programming languages. The can solve these problems in parallel. A matrix can take advantage of your computer’s parallelism with multiple cores.

## Matrix/Vector Definition and Operations

Just as a side-note, my nomenclature for the symbols I use with matrices may not be standard. This is because I did not want to take the time to learn WordPress formatting. The concepts are sound, but the nomenclature might be a little non-standard.

#### Definitions

A matrix is a two-dimensional grouping of numbers, usually within square brackets (I’ve omitted them below).

1 2 |
1 2 3 4 5 6 |

This matrix has two rows and three columns, and is noted as a 2×3 matrix.

A vector is a one-dimensional matrix (so it is a subset of matrices).

1 |
1 2 3 |

1 2 3 |
1 2 3 |

When you refer to an element in a matrix, you specify row then column. So in our 2×3 matrix above, element 2 is A_{12}.

To add or subtract two matrices, just add or subtract each of their elements that are in the same position.

1 2 |
[1 2] [5 6] [1+5 2+6] [6 8 ] [3 4] plus [7 8] equals [3+7 4+8] equals [10 12] |

To multiple or divide by a scalar (a number, not a matrix/vector), multiply that number by every element in the matrix.

1 2 |
3 * [1 2] = [3 6 ] [3 4] [9 12] |

To multiply a matrix by a vector, first the number of columns in the multiplicand (the matrix to the left of the multiplication symbol) and the number of rows in the multiplier (the matrix to the right of the multiplication symbol) must be the same. The result will be a matrix that is the multiplicand’s number of rows and the multiplier’s number of columns.

For example, A_{45} * B_{51} will result in a C_{41} matrix.

If that condition holds, you multiply each row by the column in the vector and add the results. For example…

1 2 |
[1 2] [5] [1*5 + 2*6] [5 12] [3 4] times [6] equals [3*5 + 4*6] equals [15 24] |

For matrix/matrix multiplication, you do it vector by vector in the second matrix. For example…

1 2 |
[1 2] [5 6] [1*5 + 2*7 1*6 + 2*8] [19 22] [3 4] times [7 8] equals [3*5 + 4*7 3*6 + 4*8] equals [43 50] |

Matrix multiplication is not commutative, meaning A * B is not equal to B * A. However, it is associative, meaning D * (E * F) = (D * E) * F.

The identity matrix is the equivalent of multiplying by 1 for scalar values. So multiplying a matrix by the identify matrix just gives back that same matrix. It turns out the identity matrix operates on square matrices (the number of rows and columns are the same), and it is just a diagonal row of 1s. Everything else is 0. So, as an example, an identity matrix for a 5×5 matrix is…

1 2 3 4 5 |
1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 |

## Inverse and Transverse

- The inverse of a matrix is whatever you can multiply a matrix by in order to get the identity matrix.
- Not all matrices have an inverse.
- Only square matrices can potentially have an inverse.

- The transposition of a matrix is defined as reversing the column and row for every element in a matrix. This is kind of like rotating it then flipping it backwards. It is denoted as A
^{T}.

1 2 3 |
1 2 1 3 5 3 4 2 4 6 5 6 transposed is |

## Conclusion

This part of Coursera also gave examples of Matlab/Octave commands that defined and performed operations of matrices. I won’t show those here until we actually start programming in class. My assumption is we won’t actually be doing very much linear algebra by hand. Instead, we will be using software to calculate it for us. But rather than just use the software, this course showed us how the software calculates those values.

Keep in mind, if this material interests you, you can join the course for free and follow along as I post these.