Posted by ljmacphee on February 28, 2007 under neural networks, topics in artificial intelligence |
Perceptron
Rosenblatt added the learning law to the McCulloch-Pitts neurode to make it Perception, which is the first of the neural net learning models. The perception has one layer of inputs and one layer of outputs, but only one group of weights. If data points on a plot are linearly separable (we can draw a straight line separating points that belong in different categories), then we can use this learning method to teach the neural net to properly separate the data points.
The McCulloch-Pitts neurode fires a +1 if the neurode’s total input the sum of each input * its weight + some bias function is greater than the set threshold. If it is less than the set threshold, or if there is any inhibitory input a -1 is fired. If the weights are chosen to be 1 for each input and the threshold is zero, then the bias is chosen to be 0.5 input*weight then the neurode works as an AND function. If the bias is chosen to be -0.5 then the neurode acts as a OR function. If the bias is chosen to be 0.5 it behaves as a NOT operator. Any logical function can be created using only AND, OR and NOT gates so a neural net can be created with McCulloch-Pitts neurodes to solve any logical function.
We start with a weight vector that has its tail at the origin and a randomly picked point. Each data point is input to the neurode and it responds with either a +/1, the weight vector is multiplied by the correct output. This is done until all data points are input and the neurode gives the correct output for each point.
The perception fell out of favor since it can only handle linearly separable functions which means simple functions like XOR, or parity can not be computed by them. Minsky and Papert published a book ‘Perceptions’, in the 1980’s, that proved that one and two layer neural nets could not handle many real world problems and research fell off for about twenty years in neural nets.
An additional layer and set of weights can enable the Perception to handle functions that are not linear. A separate layer is needed for each vertex needed to separate the function. A 1950’s paper by A.N. Kilmogorov published a proof that a three layer neural network could perform any mapping exactly between any two sets of numbers.
Multi layered perceptrons were developed than can handle XOR functions. Hidden layers are added and they are trained using backpropagation or a similar training algorithm. Using one layer linearly separable problems can be solved. Using two layers regions can be sorted and with three layers enclosed regions can be sorted.
Posted by ljmacphee on February 27, 2007 under neural networks, topics in artificial intelligence |
Hebbian Learning
“When the axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.” [D.O.Hebb, The Organization of Behavior]
In other words, in a neural net, the connections between neurodes get larger weights if they are repeatedly used during training.
There are adjustments that have been made to this rule. Weights are bounded between -1.0 and 1.0. Neurodes that are not used are decreased in value. Neohebbian Learning takes this into consideration. It iteratively computes each nodes connection weights using NewWeight = OldWeight F*ForgottenWeight + N*NewLearningWeight. F, N are constants between 0 and 1.0, F being how quickly to forget and N being how quickly to learn.
Differential Hebbian Learning adjusts the learning and forgetting by pro portion to the amount of change in weight since last cycle. Which is just the derivative of the neurode’s output over time.
Drive reinforcement theory developed by Harry Klopf is a learning system that modifies differential Hebbian learning. The weight increase depends on the product of the change in the output signal of the receiving neurode and the weighted sum of the inputs over time. This allows some temporal learning to occur in the system. This system is closer to the classical conditioned training done by Pavlov. 207
See also:
Reinforcement Learning
Posted by ljmacphee on February 26, 2007 under neural networks, source code, topics in artificial intelligence |
Neural Networks
Neural nets are good at doing what computers traditionally do not do well, pattern recognition. They are good for sorting data, classifying information, speech recognition, diagnosis, and predictions of non-linear phenomena. Neural nets are not programmed but learn from examples either with or without supervised feedback.
Modeled after the human brain, they give more weight to connections used frequently and reduce the size (weight) of connections not used. Some neural nets must be supervised while learning, given data to sort and given feedback as to whether data is correctly sorted, forward feed back propagation networks are the best understood and most successful of these. Some, such as self organizing networks, figure things out for themselves.
If a neural net is too large it will memorize rather than learn. Neural nets usually are composed of three layers, input, hidden, and output. More layers can be added, but usually little is gained from doing so. The connections vary by the network type. Some nets have connections from each node in one layer to the next, some have backward connections to the previous layer and some have connections with in the same layer.
McCulloch and Pitts, in 1943, proved that networks comprised of neurodes could represent any finite logical expression. In 1949 Hebb defined a method for updating the weights in neural networks. Kolmogorov’s Theorem was published in the 1950’s. It states that any mapping between two sets of numbers can be exactly done with a three layer neural network. He did not refer to neural networks in his paper, this was applied later. His paper also describes how the neural network is to be constructed. The input layer has one neurode for every input. These neurodes have a connection to each neurode in the hidden layer. The hidden layer has (2*n + 1) Neurodes, n is the number of inputs. The hidden layer sums a set of continuous real monotonically increasing functions, like the sigmoid function. The output layer has one neurode for every output. 205
Rosenblatt in 1961 developed the Perception ANN (artificial neural network). In the 1960’s Cooley and Tucky devised the Fast Fourier Transform algorithm which made signal processing with neural networks feasible. Widrow and Hoff then developed Adaline. 1969 was the year neural networks almost died. A paper published by Minsky and Papert showed that the XOR function could not be done with the Adeline and other similar networks. 1972 brought new interest with Kohonen and Anderson independently published papers about networks that learned with out supervision, SOM, (self organizing maps). Grossberg and Carpenter developed the ART (adaptive resonance theory) which learns with out supervision in the late 1960’s. The 1970’s brought NEOCOGNItrON, for visual pattern recognition. Hopfield published PDP (”Parallel Distributed Processing”) in three volumes. These books described neural networks in a way that was easy to understand.
Neural networks map sets of inputs to sets of outputs. Learning is what shapes the neural networks surface. Supervised learning algorithms take inputs and match them to outputs, correcting the network if the output does not match the desired output. Unsupervised learning algorithms do not correct the output given by the neural net. The net is provided with inputs, but not with outputs.
Training data for a neural net should be fairly representative of the actual data that will be used. All possibilities should be covered and the proportion of data in each area should match the proportion in the real data. Ways of training of neural nets:
Hard coded weights determined by experience or mathematical formulas can serve in place of a training algorithm.
Supervised training uses input and matching output patterns to let the net set the weights.
Graded training only uses input patterns, but then the neural net receives feedback on how accurate its answer is.
Unsupervised Training uses only input patterns then the neural nets out put is the correct answer.
Autonomous learning in neural nets is different from other unsupervised learning systems in that the neural net can learn selectively, it doesn’t learn every pattern input, only those that are ‘important’. An autonomous learning neural net has the following capabilities; it organizes information into categories without outside input and will reorganize them if it makes sense to do so; it retrieves information from less than perfect input; it is configured to work in parallel to keep speed reasonable; the system is always selectively learning; priorities given to input patterns can change; it can generalize; and it has more memory space than it needs; it must be able to expand and add to its knowledge rather than overwriting previously learned knowledge. Of course something this wonderful should also make your coffee and sort your email for you too.
The delta rule is used for error correction in backpropagation networks. This is also known as the least mean squared rule. NewWeight = OldWeight LearningConstant*NeurodeOutput(desiredOutput-actualOutput) The delta rule uses local information for error correction. This rule looks for a minimum. In an effort to find a minimum it may find a local minimum rather than the global minimum. Picture trying to find the deepest hole in your yard, if you measure small sections at a time you may locate a hole but it may not be the deepest in the yard. The generalized delta rule seeks to correct this by looking at the gradient for the entire surface, not just local gradients.
Simulated annealing is a statistical way to solve optimization problems, like setting a schedule or wiring a network. Boltzmann networks use this algorithm to learn. A random solution is chosen and compared to the current best solution found. The better of the two is kept and then depending on the problem some random changes are made. The amount of randomness in each loop is decreased over time allowing the net to slowly settle into a solution. The randomness helps to keep the net from settling into local minimas rather than global minimas.
Self organization is a form of unsupervised learning. This sets weights with a ‘winner take all’ algorithm. Each neurode learns a classification. Input vectors will be classed into the group to which they are closest.
The Lyapunov function, also known as the energy function, is used to test for convergence of the neural network. The function decreases as the network changes and assures stability.
Neural net building tool ( Java )
More information:
Birth of a Learn Law, Steve Grossberg
Introduction to Neural Networks
See also:
Neural networks in financial modeling
Song of the neurons