Artificial Neural Networks

 We can now look at more sophisticated ANNs, which are known as  multi-layer artificial neural networks because they have hidden layers.  These will naturally be used to undertake more complicated tasks than  perceptron’s. We first look at the network structure for multi-layer  ANNs, and then in detail at the way in which the weights in such  structures can be determined to solve machine learning problems.  There are many considerations involved with learning such ANNs,  and we consider some of them here. First and foremost, the algorithm  can get stuck in local minima, and there are some ways to try to get  around this. As with any learning technique, we will also consider the  problem of overfitting, and discuss which types of problems an ANN  approach is suitable for. 

Multi-Layer Network Architectures 

We saw in the previous lecture that perceptron’s have limited scope in  the type of concepts they can learn - they can only learn linearly  separable functions. However, we can think of constructing larger  networks by building them out of perceptron’s. In such larger  networks, we call the step function units the perceptron units in multi layer networks. 

As with individual perceptrons, multi-layer networks can be used for  learning tasks. However, the learning algorithm that we look at (the  backpropagation routine) is derived mathematically, using differential  calculus. The derivation relies on having a differentiable threshold  function, which effectively rules out using perceptron units if we want  to be sure that backpropagation works correctly. The step function in  perceptron is not continuous, hence non-differentiable. An  alternative unit was therefore chosen which had similar properties to  the step function in perceptron units, but which was differentiable.  There are many possibilities, one of which is sigmoid units, as  described below.

Sigmoid units 

Remember that the function inside units take as input the weighted  sum, S, of the values coming from the units connected to it. The  function inside sigmoid units calculates the following value, given a  real-valued input S: 

Where e is the base of natural logarithms, e = 2.718... 

When we plot the output from sigmoid units given various weighted  sums as input, it looks remarkably like a step function:

Example Multi-layer ANN with Sigmoid Units 

We will concern ourselves here with ANNs containing only one  hidden layer, as this makes describing the backpropagation routine  easier. Note that networks where you can feed in the input on the left  and propagate it forward to get an output are called feed forward  networks. Below is such an ANN, with two sigmoid units in the  hidden layer. The weights have been set arbitrarily between all the  units. 


Note that the sigma units have been identified with sigma signs in the  node on the graph. As we did with perceptron’s, we can give this 

network an input and determine the output. We can also look to see  which units "fired", i.e., had a value closer to 1 than to 0.

Suppose we input the values 10, 30, 20 into the three input units, from  top to bottom. Then the weighted sum coming into H1 will be: 

SH1 = (0.2 * 10) + (-0.1 * 30) + (0.4 * 20) = 2 -3 + 8 = 7. Then the σ function is applied to SH1 to give: 

σ(SH1) = 1/(1+e-7) = 1/(1+0.000912) = 0.999 

[Don't forget to negate S]. Similarly, the weighted sum coming into  H2 will be: 

SH2 = (0.7 * 10) + (-1.2 * 30) + (1.2 * 20) = 7 - 36 + 24 = -5 and σ applied to SH2 gives: 

σ(SH2) = 1/(1+e5) = 1/(1+148.4) = 0.0067 

From this, we can see that H1 has fired, but H2 has not. We can now  calculate that the weighted sum going in to output unit O1 will be: 

SO1 = (1.1 * 0.999) + (0.1*0.0067) = 1.0996

and the weighted sum going in to output unit O2 will be: SO2 = (3.1 * 0.999) + (1.17*0.0067) = 3.1047 

The output sigmoid unit in O1 will now calculate the output values  from the network for O1: 

σ(SO1) = 1/(1+e-1.0996) = 1/(1+0.333) = 0.750 

and the output from the network for O2: 

σ(SO2) = 1/(1+e-3.1047) = 1/(1+0.045) = 0.957 

Therefore, if this network represented the learned rules for a  categorisation problem, the input triple (10,30,20) would be  categorised into the category associated with O2, because this has the  larger output. 

Backpropagation Learning Routine 

As with perceptron’s, the information in the network is stored in the  weights, so the learning problem comes down to the question: how do  we train the weights to best categorise the training examples. We then  hope that this representation provides a good way to categorise  unseen examples. 

In outline, the backpropagation method is the same as for  perceptron’s:

We choose and fix our architecture for the network, which will  contain input, hidden and output units, all of which will contain  sigmoid functions.

We randomly assign the weights between all the nodes. The  assignments should be to small numbers, usually between -0.5 and  0.5. 

Each training example is used, one after another, to re-train the  weights in the network. The way this is done is given in detail below. 

After each epoch (run through all the training examples), a  termination condition is checked (also detailed below). Note that, for  this method, we are not guaranteed to find weights which give the  network the global minimum error, i.e., perfectly correct  categorisation of the training examples. Hence the termination  condition may have to be in terms of a (possibly small) number of  mis-categorisations. We see later that this might not be such a good  idea, though. 

Weight Training Calculations 

Because we have more weights in our network than in perceptron’s,  we firstly need to introduce the notation: wij to specify the weight  between unit i and unit j. As with perceptron’s, we will calculate a  value Δij to add on to each weight in the network after an example has  been tried. To calculate the weight changes for a particular example,  E, we first start with the information about how the network should  perform for E. That is, we write down the target values ti(E) that each  output unit Oi should produce for E. Note that, for categorisation  problems, ti(E) will be zero for all the output units except one, which  is the unit associated with the correct categorisation for E. For that  unit, ti(E) will be 1.

Next, example E is propagated through the network so that we can  record all the observed values oi(E) for the output nodes Oi. At the  same time, we record all the observed values hi(E) for the hidden  nodes. Then, for each output unit Ok, we calculate its error term as  follows: 


SHARE

Milan Tomic

Hi. I’m Designer of Blog Magic. I’m CEO/Founder of ThemeXpose. I’m Creative Art Director, Web Designer, UI/UX Designer, Interaction Designer, Industrial Designer, Web Developer, Business Enthusiast, StartUp Enthusiast, Speaker, Writer and Photographer. Inspired to make things looks better.

  • Image
  • Image
  • Image
  • Image
  • Image
    Blogger Comment
    Facebook Comment

0 comments:

Post a Comment

4.Time Management

                                      Time Management   •        Effective time management in project management involves strategic plann...