Wavelets

Posted by ljmacphee on July 28, 2008 under artificial intelligence in the news, neural networks | Be the First to Comment

I hadn’t heard anything about wavlets in several years and then this news story caught my eye.

. . .Meningiomas are tumours of the brain and nervous system and they account for 20% of all brain tumours. Doctors have a major problem of discriminating between the four different subtypes of meningiomas but doctors face three key problems in making such a diagnois:

– The work can be painstakingly slow requiring up to two hours of analysis and expert consideration of a full “slide” of information.

– The finest tumour specialists (histopathologists) can at times come up with completely contradictory findings based on slight variations in their method of analysis.

– Currently the slides that specialists examine contain a few million pixels of data and the task of tumour diagnosis is painstakingly slow already. This problem is quite literally growing as medical equipment is coming on stream that can produce slides with hundreds of millions pixel resolution.

. . .

Now researchers in the University of Warwick’s Department of Computer Science have devised a method of using “wavelets” to provide an automated analysis of the varying texture of the tumours and guidance to doctor’s within seconds of being presented the data.

[ read more Wavelets crunch through doctor’s day to long struggle to diagnose brain tumors

Maybe wavelets are about to make a bigger splash in the world of artificial intelligence?

Learn more:
An introduction to wavelets
A really friendly guide to wavelets
Tutorial on continuous wavelet analysis
Wavelet ( Wolfram site )
Wavelets
Wavelets for computer graphics

Code:
WAILI - Wavelets C++ library ( open source )
PyWavelets - Python library ( open source )
Wavelets in Java ( source code provided)

Hopfield Networks

Posted by ljmacphee on March 13, 2007 under neural networks, source code, topics in artificial intelligence | Be the First to Comment

John Hopfield, in the late 1970’s, brought us these networks. These networks can be generalized and are robust. These networks can also be described mathematically. On the downside they can only store 15% as many states as they have neurodes, and the patterns stored must have Hamming distances that are about 50% of the number of neurodes.

Hopfield networks, aka crossbar systems, are a networks that recall what is fed into them. This makes it useful for restoring degraded images. It is a fully connected net, every node is connected to every other node. The nodes are not connected to themselves.

Calculating the weight matrix for a Hopfield network is easy. This is an example with 3 input vectors. You can train the network to match any number of vectors provided that they are orthogonal.

Orthogonal vectors are vectors that give zero when you calculate the dot product.
orthogonal (0, 0, 0, 1) (1, 1, 1, 0) => 0*1 + 0*1 + 0*1 + 1*0 = 0
orthogonal (1, 0, 1, 0) (0, 1, 0, 1) => 1*0 + 0*1 + 1*0 + 0*1 = 0
NOT orthogonal (0, 0, 0, 1) (0, 1, 0, 1) = 0*0 + 0*1 + 0*0 + 1*1 = 1
Orthogonal vectors are perpendicular to each other.

To calculate the weight matrix for the orthogonal vectors (0, 1, 0, 0), (1, 0, 1, 0), (0, 0, 0, 1)
first make sure all the vectors are orthogonal
(0, 1, 0, 0) (1, 0, 1, 0) => 0*1 + 1*0 + 0*1 + 0*0 = 0
(0, 1, 0, 0) (0, 0, 0, 1) => 0*0 + 1*0 + 0*0 + 0*1 = 0
(1, 0, 1, 0) (0, 0, 0, 1) => 1*0 + 1*0 + 1*0 + 0*1 = 0

Change the zeros to negative ones in each vector
(0, 1, 0, 0) => (-1, 1, -1, -1)
(1, 0, 1, 0) => (1, -1, 1, -1)
(0, 0, 0, 1) => (-1, -1, -1, 1)

Multiply each matrix by itself

(-1) (-1, 1, -1, -1)

[ 1, -1, 1, 1]

( 1)

[-1, 1, -1, -1]

(-1)

[ 1, -1, 1, 1]

(-1)

[ 1, -1, 1, 1]

( 1) (1, -1, 1, -1)

[ 1, -1, 1, -1]

(-1)

[-1, 1, -1, 1]

( 1)

[ 1, -1, 1, -1]

(-1)

[-1, 1, -1, 1]

(-1) (-1, -1, -1, 1)

[ 1, 1, 1,-1]

(-1)

[ 1, 1, 1,-1]

(-1)

[ 1, 1, 1,-1]

( 1)

[-1,-1,-1,-1]

The third step is to put zeros on the main diagonal of each matrix and add them together. (Putting zeros on the main diagonal keeps each node from being connected to itself.

( 0,-1, 1, 1)

( 0,-1, 1,-1)

(0, 1, 1,-1)

(-1, 0,-1,-1)

+

(-1, 0,-1, 1)

+

(1, 0, 1,-1)

( 1,-1, 0, 1)

( 1,-1, 0,-1)

(1, 1, 0,-1)

( 1,-1, 1, 1)

(-1, 1,-1, 0)

(1, 1, 1, 0)

The resulting matrix is:

[ 0,-1, 3,-1]

[-1, 0,-1,-1]

[ 3,-1, 0,-1]

[ 1, 1, 0, 1]

The Hopfield network is fully connected, each weight connects to every other weight
[n1] -> [n2] => weight is -1
[n1] -> [n3] => weight is 3
[n1] -> [n4] => weight is -1
[n2] -> [n1] => weight is -1
[n2] -> [n3] => weight is -1
[n2] -> [n4] => weight is -1
[n3] -> [n1] => weight is 3
[n3] -> [n2] => weight is -1
[n3] -> [n4] => weight is -1
[n4] -> [n1] => weight is 1
[n4] -> [n2] => weight is 1
[n4] -> [n3] => weight is 1

They can also be described as having a potential energy surface with conical holes representing the data. Holes having similar depth and diameter represent data with similar properties. The input data seeks the lowest potential energy and settles in to the closest hole. The energy surfaces of these networks are mathematically equivalent to that of ’spin glasses’.

Some problems with these neural nets are they are computationally intensive, use lots of memory, and although I haven’t seen it mentioned I would guess race conditions may present a problem since data is updated continuously at each node with the output from one becoming the input for another.

BAM, bidirectional associative memory is an example of a Hopfield network. It consists of two fully connected layers, one for input and one for output. The nodes in each layer do not have connections to other nodes in the same layer. The weights are bidirectional, meaning that there is communication in both directions along the weight vector. There are no connections between neurodes in the same layer. BAM networks take only -1’s and 1’s as input and only output -1’s and 1’s. So if you are working with binary data, you must convert the zeros to -1’s. The weights are calculated in the same way as the Hopfield example above. The nodes are either 0 or 1 (on or off).
Hopfield Network Example

Backpropagation Networks

Posted by ljmacphee on March 12, 2007 under neural networks, source code, topics in artificial intelligence | Be the First to Comment

Forward Feed Back Propagation networks (aka Three Layer Forward Feed Networks) have been very successful. Some uses include teaching neural networks to play games, speak and recognize things. Backpropagation networks can be used on several network architectures. The networks are all highly interconnected and use non-linear transfer functions. The network must have at minimum three layers, but rarely needs more than three layers.

Back-propagation supervised training for Forward-Feed neural nets uses pairs of input and output patterns. The weights on all the vectors are set to random values. Then input is fed to the net and propagates to the output layer and the errors are calculated. Then the error correction is propagated back through the hidden layer then to the input layer in the network. There is one input neurode for each number (dimension) in the input vector, there is one output neurode for each dimension in the output vector. So the network maps IN-dimensional space to OUT-dimension space. There is no set rule for determining the number of hidden layers or the number of neurodes in the hidden layer. However, if too few hidden neurodes are chosen then the network can not learn. If too many are chosen, then the network memorizes the patterns rather than learning to extract relevant information. A rule of thumb for choosing the number of hidden neurodes is to choose log ( 2)X where X is the number of patterns. So if you have 8 distinct patterns to be learned, then log ( 2)8 = 3 and 3 hidden neurodes are probably needed. This is just a rule of thumb, experiment to see what works best for your situation.

The error vector is aimed at zero during training. The vector is calculated as: Error = ( 1/2 * (sum (desired-actual)^2)) To get the error close to zero, with in a tolerance, we use iteration. Each iteration we move a step downward. We take the gradient, the derivative of a vector, and use the steepest descent to minimize the error. So thenewweight = oldW eight + stepsize (-gradientW (e(W )).

The derivative of the function T (x) = (1/(1 e^-x )) is just T (x) (1 T (x)) so using the chain rule we arrive at the error correction function (desired actual)(1 actual) eachN odeOutW eight eachN odeHiddenW eight the weight is then changed by the amount of the error correction function as it propagates back through the network.

To train the net all weights are randomly set to a value between -1.0 and 1.0

To do the calculations going forward through the net:

Each NodeInput is multplied by each weight connected to it

Each HiddenNode sums up these incoming weights and adds a bias to the total

This value is used in the sigmoid function as x { 1/(1+e^-x) }

If this value is greater than the threshold the HiddenNode fires this value, else it fires zero

Each HiddenNode is multiplied by each weight connected to it

Each OutputNode sums up these incoming weights and adds a bias to the total

This value is used in the sigmoid function as x { 1/(1 + e^-x) }

This is the value out put by the OutputNode

To calculate the adjusments during training, you figure out the error and propigate it back like this:
Adjust weights between HiddenNodes and OutputNodes

ErrorOut = ( OutputNode)*(1-OutputNode)(DesiredOutput - OutputNode)

ErrorHidden = (HiddenNode)*(1-HiddenNode)*(Sum { ErrorOut*Weight + ErrorOut*Weight … } ) for each weight connected to this node

LearningRate = LearningConstant * HiddenNode

(LearningConstant is usually set to something around 0.2 )

Adjustment = ErrorOut * LearningRate

Weight = Weight - Adjustment

Adjust weights between HiddenNodes and InputNodes

Adjustment = ( ErrorHidden)*(LearningConstant)*(NodeInput)

Weight = Weight - Adjustment

Adjust Threshold

On OutputNode, Threshold = Threshold - ErrorOut * LearningRate

On HiddenNode, Threshold = Threshold - ErrorHidden * LearningRate

If you use a neural net that also accounts for imaginary numbers you can adapt this function so it is not always positive and calculate all of the four derivatives needed.

Numerous iterations are required for a backpropagation network to learn. Therefore it is not practical for neural nets that must learn in ‘real time’. It will not always arrive at a correct set of weights. It may get trapped in local minimums rather than an actual minimum. This is a problem with the ’steepest decent’ algorithm. A momentum term that allows the calculation to slide over small bumps is sometimes employed. Back propagation networks do not scale well. They are only good for small neural nets.
Winning Dog Track Predictor ( C/PERL)

Kohnonen Neural Nets ( Self Organizing Networks )

Posted by ljmacphee on March 9, 2007 under neural networks, source code, topics in artificial intelligence | Be the First to Comment

Kohnonen Neural Nets (Self Organizing Networks)

The Kohonen Self Organizing Map (Network) uses unsupervised, competitive learning. These networks are used for data clustering as in, speech recognition and handwriting recognition. They are also used for sparsely distributed data. Self Organizing Networks consist of two layers, an input layer and a Kohonen layer.The input layer has a node for each dimension of the input vector. The input nodes distribute the input pattern to each node in the the Kohonen layer so the two layers are fully connected. The output layer has at least as many nodes as categories to be recognized. One neurode in the output layer will be activated for each pattern. Each input is connected to each output and there are no connections between the layers.

The network uses lateral inhibition, which is how the vision system works in people. Connections are formed to neighboring neurodes which are inhibitory. The strength of the neurode is inversely proportional to the distance it is away from other nodes. The neurode with the strongest signal dampens the neurodes close to it using a Mexican Hat function. (so called because it looks like a Mexican hat.) The Mexican Hat function is also used in wavelets and image processing..

The neurodes close to the one activated take part in the training, the others do not. To make it computationally efficient a step function is used instead of a true Mexican hat function.

General algorithm
The weights between the nodes are initialized to random values between 0.0 and 1.0.
Then the weight vector is normalized.
The learning rate is set between 1.0 and 0.0 and decreased linearly each iteration.
The neighborhood size is set and decreased linearly each iteration The input vector is normalized and fed into the network.
The input vector is multiplied by the connection weights and the total is accumulated by the Kohonen network nodes.

The winning nodes out put is set to one and all the other nodes are set to zero.
Weights are adjusted Wnew = Wold + training constant ( input Wold) Training continues until a winning node vector meets some minimum error standard.
SelfOrganizing Network example

Neural Net Meshes

Posted by ljmacphee on March 8, 2007 under neural networks, topics in artificial intelligence | Be the First to Comment

Neural Net Meshes

Meshes are used in visualization, image processing, neurology and physics applications. They are a grid of regular or irregular shape that stores information or represents a shape rather than a flat object. Neural nets are used to adjust the meshes in 3d graphics.

Meshes also derived from Pask’s Conversation Theory. The gist of the meshes being that distributed information (like that of the Internet) adapts to the semantic expectations of the users. The system then self organizes to meet expectations.