optimization - Checking the gradient when doing gradient descent -
i'm trying implement feed-forward backpropagating autoencoder (training gradient descent) , wanted verify i'm calculating gradient correctly. tutorial suggests calculating derivative of each parameter 1 @ time: grad_i(theta) = (j(theta_i+epsilon) - j(theta_i-epsilon)) / (2*epsilon)
. i've written sample piece of code in matlab this, without luck -- differences between gradient calculated derivative , gradient numerically found tend largish (>> 4 significant figures).
if can offer suggestions, appreciate (either in calculation of gradient or how perform check). because i've simplified code make more readable, haven't included biases, , no longer tying weight matrices.
first, initialize variables:
numhidden = 200; numvisible = 784; low = -4*sqrt(6./(numhidden + numvisible)); high = 4*sqrt(6./(numhidden + numvisible)); encoder = low + (high-low)*rand(numvisible, numhidden); decoder = low + (high-low)*rand(numhidden, numvisible);
next, given input image x
, feed-forward propagation:
a = sigmoid(x*encoder); z = sigmoid(a*decoder); % (reconstruction of x)
the loss function i'm using standard σ(0.5*(z - x)^2)):
% first calculate error finding derivative of sum(0.5*(z-x).^2), % (f(h)-x)*f'(h), z = f(h), h = a*decoder, , % f = sigmoid(x). however, since derivative of sigmoid % sigmoid*(1 - sigmoid), get: error_0 = (z - x).*z.*(1-z); % gradient \delta w_{ji} = error_j*a_i gdecoder = error_0'*a; % not important, included completeness % back-propagation 1 layer down error_1 = (error_0*encoder).*a.*(1-a); gencoder = error_1'*x;
and finally, check gradient correct (in case, decoder):
epsilon = 10e-5; check = gdecoder(:); % values obtained above = 1:size(decoder(:), 1) % calculate j+ theta = decoder(:); % unroll theta(i) = theta(i) + epsilon; decoderp = reshape(theta, size(decoder)); % re-roll = sigmoid(x*encoder); z = sigmoid(a*decoderp); jp = sum(0.5*(z - x).^2); % calculate j- theta = decoder(:); theta(i) = theta(i) - epsilon; decoderp = reshape(theta, size(decoder)); = sigmoid(x*encoder); z = sigmoid(a*decoderp); jm = sum(0.5*(z - x).^2); grad_i = (jp - jm) / (2*epsilon); diff = abs(grad_i - check(i)); fprintf('%d: %f <=> %f: %f\n', i, grad_i, check(i), diff); end
running on mnist dataset (for first entry) gives results such as:
2: 0.093885 <=> 0.028398: 0.065487 3: 0.066285 <=> 0.031096: 0.035189 5: 0.053074 <=> 0.019839: 0.033235 6: 0.108249 <=> 0.042407: 0.065843 7: 0.091576 <=> 0.009014: 0.082562
do not sigmoid on both , z. use on z.
a = x*encoder; z = sigmoid(a*decoderp);
Comments
Post a Comment