Evaluating a Learning Algorithm (Polynomial Regression and Classification)
This is based on week 3 of the Machine Learning Specialization on Coursera. It is taught by Andrew NG. Note that you will not find any answers here, I am simply using medium to solidify the maths and the coding required to evaluate a learning algorithm that either has high variance or high bias.
You might find this interesting if you like maths or if you are a programmer who is looking to improve your maths skills. I am a terrible programmer and terrible mathematician but alas, if I can understand these concepts, you certainly can too.
Here is the problem, we have created a learning model. It has extremely high variance which means it overfits the data and that any subsequent new data that is given to your machine learning model does not fit very well. This is a problem and needs to be addressed. We can do this by splitting the data into a ‘Training’ and ‘Test’ set. This is a common technique used by engineers to fit training data parameters to a model and evaluate the model based on the test data.
We can allocate 20–40% of our data set for testing and we can use the sklearn function that can perform this split for us. We can do this using something like this:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.33, random_state=1)#lets print the shape as well as some descriptions
print("X_train.shape", X_train.shape, "y_train.shape", y_train.shape)
print("X_test.shape", X_test.shape, "y_test.shape", y_test.shape)#lets print the shape as well as some descriptions
X.shape (180,) y.shape (180,)
X_train.shape (120,) y_train.shape (120,)
X_test.shape (60,) y_test.shape (60,)
As you can see the original X and y data sets contained 180 values. We defined the split using the test_size= argument. It was set to .33. Lo and behold the data has been split to .33 for test data sets and .66 for the train data sets. You might have noticed that there is .01 unaccounted for, we do not need to worry about that since .33 and .66 are both recurring. (into infinity?)
Now we must calculate the error for the model.
Error calculation for model evaluation
Look at this glorious math!
If we break this down, it is quite easy too. Forget about the math for a minute and let's take a look at what it is doing. Bridging the gap between math and code one step at a time is a wonderful way to conceptualise these ideas.
We need a few things for this. We need ‘m’ which is the len(y), which is one of the values we need to create our machine learning model. We also need to define a value for our error which will be 0. This is what we currently have:
m = len(y)
error = 0
Now we need to iterate over each of the x_test and y test values so that we can calculate the summed error and add this to our error variable which currently equals 0.
for i in range(m)
error_i = ((y_pred[i] - y[i])**2)
error += error_i
error = error / (2*m)
I am well aware that I have not covered y_pred and I do not intend to here. This was already defined as part of the function and course and if I were to explain any more or write out the entire process it would be giving you the answer to the practical lab on the course I mentioned previously. They specifically request we do not do this.
Anyway, you can see that this loop loops over the error_i for the number of values there are available in the variable ‘m’ and then calculates the error using (y_pred[i] — y[i])². After this we simply divide the error by 2m. You should be able to place these values back to the glorious math we had before.
HINT: That funny looking E does the same thing as our for loop.
Comparing performance on training and test data
Now, we build a high-degree polynomial model to minimise errors. We need to do three things to implement this.
- Create and fit the model
- Compute the error on the training data.
- Computer the error on the test data.
You can see that the error for the training set is a lot lower than for the test set. This is because our high-degree polynomial doesn’t predict data very well it only fits the data we give it. We could generally say that this model is described as:
- Overfit.
- High Variance.
- Generalises poorly.
Now, we are going to use the train_test_split command again and call it twice to get three splits.
Bias and Variance
We could potentially use a lower-degree polynomial so that we find the optimal degree. But how exactly do we find the optimal degree? We could increase the degree of the polynomial on each iteration so that we do eventually find the best degree polynomial to use which results in a good trade-off between the test data and training data.
In Andrew NG’s course, we used scikit-learn’s linear regression model.
Well, the first graph on the left looks messy but the second graph actually gives us more information. We found that a 2nd-degree polynomial gives us the lowest error and discrepancy between both the training set and the cross-validation set.
We can use the same process to tun regularisation. Meaning we can adjust the parameter lambda.
You might have guessed it but we can also plot the error against the number of training examples which is another way we could prevent overfitting. Here is the graph set for the number of examples in our training set (m).
Next, we will evaluate a Neural Network.
Evaluating and Neural Network
Even though I am quite bad at math, I do enjoy it. Here is the algorithm for the particular neural network we will be working on:
This can be written as a for loop with an if statement nested inside.
This math is simply saying that for all the values within ‘m’, if the yhat[i] of that given value is not equal to y[i] then ‘1’. This will happen multiple times over the data and that is why we use a for loop. Each time we have ‘1’ it is added to a variable which is then divided by the total number of data points we used. This will return the categorization error.
There is actually a lot more left of the current assignment that his piece of writing is based on but I will stop the post here. The rest of the assignment doesn’t align with what I had originally intended for this post. I hope you enjoyed reading this as much as I enjoyed writing it. It has been a pleasure learning about machine learning, coding and math and I feel incredibly privileged to wake up every day and work on something that makes a difference, even if it is in a very small way.