Forward Feed Back Propagation networks (aka Three Layer Forward Feed Networks) have been very successful. Some uses include teaching neural networks to play games, speak and recognize things. Backpropagation networks can be used on several network architectures. The networks are all highly interconnected and use non-linear transfer functions. The network must have at minimum three layers, but rarely needs more than three layers.
Back-propagation supervised training for Forward-Feed neural nets uses pairs of input and output patterns. The weights on all the vectors are set to random values. Then input is fed to the net and propagates to the output layer and the errors are calculated. Then the error correction is propagated back through the hidden layer then to the input layer in the network. There is one input neurode for each number (dimension) in the input vector, there is one output neurode for each dimension in the output vector. So the network maps IN-dimensional space to OUT-dimension space. There is no set rule for determining the number of hidden layers or the number of neurodes in the hidden layer. However, if too few hidden neurodes are chosen then the network can not learn. If too many are chosen, then the network memorizes the patterns rather than learning to extract relevant information. A rule of thumb for choosing the number of hidden neurodes is to choose log ( 2)X where X is the number of patterns. So if you have 8 distinct patterns to be learned, then log ( 2)8 = 3 and 3 hidden neurodes are probably needed. This is just a rule of thumb, experiment to see what works best for your situation.
The error vector is aimed at zero during training. The vector is calculated as: Error = ( 1/2 * (sum (desired-actual)^2)) To get the error close to zero, with in a tolerance, we use iteration. Each iteration we move a step downward. We take the gradient, the derivative of a vector, and use the steepest descent to minimize the error. So thenewweight = oldW eight + stepsize (-gradientW (e(W )).
The derivative of the function T (x) = (1/(1 e^-x )) is just T (x) (1 T (x)) so using the chain rule we arrive at the error correction function (desired actual)(1 actual) eachN odeOutW eight eachN odeHiddenW eight the weight is then changed by the amount of the error correction function as it propagates back through the network.
To train the net all weights are randomly set to a value between -1.0 and 1.0
To do the calculations going forward through the net:
Each NodeInput is multplied by each weight connected to it
Each HiddenNode sums up these incoming weights and adds a bias to the total
This value is used in the sigmoid function as x { 1/(1+e^-x) }
If this value is greater than the threshold the HiddenNode fires this value, else it fires zero
Each HiddenNode is multiplied by each weight connected to it
Each OutputNode sums up these incoming weights and adds a bias to the total
This value is used in the sigmoid function as x { 1/(1 + e^-x) }
This is the value out put by the OutputNode
To calculate the adjusments during training, you figure out the error and propigate it back like this:
Adjust weights between HiddenNodes and OutputNodes
ErrorOut = ( OutputNode)*(1-OutputNode)(DesiredOutput - OutputNode)
ErrorHidden = (HiddenNode)*(1-HiddenNode)*(Sum { ErrorOut*Weight + ErrorOut*Weight … } ) for each weight connected to this node
LearningRate = LearningConstant * HiddenNode
(LearningConstant is usually set to something around 0.2 )
Adjustment = ErrorOut * LearningRate
Weight = Weight - Adjustment
Adjust weights between HiddenNodes and InputNodes
Adjustment = ( ErrorHidden)*(LearningConstant)*(NodeInput)
Weight = Weight - Adjustment
Adjust Threshold
On OutputNode, Threshold = Threshold - ErrorOut * LearningRate
On HiddenNode, Threshold = Threshold - ErrorHidden * LearningRate
If you use a neural net that also accounts for imaginary numbers you can adapt this function so it is not always positive and calculate all of the four derivatives needed.
Numerous iterations are required for a backpropagation network to learn. Therefore it is not practical for neural nets that must learn in ‘real time’. It will not always arrive at a correct set of weights. It may get trapped in local minimums rather than an actual minimum. This is a problem with the ’steepest decent’ algorithm. A momentum term that allows the calculation to slide over small bumps is sometimes employed. Back propagation networks do not scale well. They are only good for small neural nets.
Winning Dog Track Predictor ( C/PERL)
0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.
You must log in to post a comment.