Tuesday, September 6, 2016

Exercise2: Regularized Logistic Regression Gradient Descent Algorithm

1. Definitions


  • Hypothesis: Logistic regression hypothesis is defined as:


where function g is the sigmoid function defined as:

  • Cost Function:

Watch out! do NOT add theta0 into the regularization item above, because you should not be regularizing theta 0 parameter.

2. Gradient Descent Algorithm

It is amazing that this algorithm is identical to the regularized linear regression.

Repeat {
    , j = 0, 1, 2, ... n, (update all thetas simultaneously)
}

here gradients are given below (theta0 is different from the other thetas):
They can be rewritten by using this vectorization: 
Here is the code in Octave/MatLib:
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y);      % number of training examples
n = length(theta);  % number of features in a example

% Cost Function J:
h = sigmoid(X*theta);
J = (-1/m) * sum((y .* log(h)) + ((1 - y) .* log(1-h)));

t = theta(2:n);
J = J + (lambda/(2*m)) * (t' * t);  %regularization on cost function

% gradient vector:
grad = (1/m) * (X' * (h -y));
grad(2:n) = grad(2:n) .+ (lambda/m)*t;  %regularization on gradients

end 

function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic 
%regression parameters theta
%   p = PREDICT(theta, X) computes the predictions for X using a 
%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)

p = (X*theta >=0);  % identical to p = sigmoid(X*theta) >= 0.5;

end

function g = sigmoid(z)
%SIGMOID Compute sigmoid functoon
%   J = SIGMOID(z) computes the sigmoid of z.

g = 1 ./ (1 + exp(-z));
end

3. Demo

I implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly. Suppose you are the product manager of the factory and you have the test results for some microchips on two di erent tests. From these two tests, you would like to determine whether the microchips should be accepted or rejected. To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model.

Below is the plot of examples with a decision boundary created by my regularized logistic regression algorithm with lambda=1. The input of sigmoid function, theta'*x, is a cubical polynomial.

4. Study Note

1 comment: