optimization - Checking the gradient when doing gradient descent -

i'm trying implement feed-forward backpropagating autoencoder (training gradient descent) , wanted verify i'm calculating gradient correctly. tutorial suggests calculating derivative of each parameter 1 @ time: grad_i(theta) = (j(theta_i+epsilon) - j(theta_i-epsilon)) / (2*epsilon). i've written sample piece of code in matlab this, without luck -- differences between gradient calculated derivative , gradient numerically found tend largish (>> 4 significant figures).

if can offer suggestions, appreciate (either in calculation of gradient or how perform check). because i've simplified code make more readable, haven't included biases, , no longer tying weight matrices.

first, initialize variables:

numhidden = 200; numvisible = 784; low = -4*sqrt(6./(numhidden + numvisible)); high = 4*sqrt(6./(numhidden + numvisible)); encoder = low + (high-low)*rand(numvisible, numhidden); decoder = low + (high-low)*rand(numhidden, numvisible);

next, given input image x, feed-forward propagation:

a = sigmoid(x*encoder); z = sigmoid(a*decoder); % (reconstruction of x)

the loss function i'm using standard σ(0.5*(z - x)^2)):

% first calculate error finding derivative of sum(0.5*(z-x).^2),  % (f(h)-x)*f'(h), z = f(h), h = a*decoder, ,  % f = sigmoid(x). however, since derivative of sigmoid  % sigmoid*(1 - sigmoid), get: error_0 = (z - x).*z.*(1-z);  % gradient \delta w_{ji} = error_j*a_i gdecoder = error_0'*a;  % not important, included completeness % back-propagation 1 layer down error_1 = (error_0*encoder).*a.*(1-a); gencoder = error_1'*x;

and finally, check gradient correct (in case, decoder):

epsilon = 10e-5; check = gdecoder(:); % values obtained above = 1:size(decoder(:), 1)     % calculate j+     theta = decoder(:); % unroll     theta(i) = theta(i) + epsilon;     decoderp = reshape(theta, size(decoder)); % re-roll     = sigmoid(x*encoder);     z = sigmoid(a*decoderp);     jp = sum(0.5*(z - x).^2);      % calculate j-     theta = decoder(:);     theta(i) = theta(i) - epsilon;     decoderp = reshape(theta, size(decoder));     = sigmoid(x*encoder);     z = sigmoid(a*decoderp);     jm = sum(0.5*(z - x).^2);      grad_i = (jp - jm) / (2*epsilon);     diff = abs(grad_i - check(i));     fprintf('%d: %f <=> %f: %f\n', i, grad_i, check(i), diff); end

running on mnist dataset (for first entry) gives results such as:

2: 0.093885 <=> 0.028398: 0.065487 3: 0.066285 <=> 0.031096: 0.035189 5: 0.053074 <=> 0.019839: 0.033235 6: 0.108249 <=> 0.042407: 0.065843 7: 0.091576 <=> 0.009014: 0.082562

do not sigmoid on both , z. use on z.

a = x*encoder; z = sigmoid(a*decoderp);

Search This Blog

Barbera

optimization - Checking the gradient when doing gradient descent -

Comments

Post a Comment

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -