The figure on the right demonstrates the use of Softmax in the final stage of a brain tumor classification CNN illustrated in the previous Q&A.
Here, the raw output data [−0.1, 3.8, 1.1, −0.3] has been transformed into relative "probabilities" by Softmax.
In brief, the Softmax functions uses exponentials (ezi) to convert individual components of the output vector into positive numbers. For a given vector element (zi), Softmax transforms this value into a probability by first raising it to an exponential, then dividing by the summed exponentials of all the elements in the original output vector. As an example, the Softmax value for the second vector element z2 = 3.8 is calculated as
Advanced Discussion (show/hide)»
The Softmax function is essentially the same as the Boltzmann distribution from statistical mechanics. As you may recall, the Boltzmann distribution is a probability measure that a system will be in a certain state as a function of energy and temperature of its particles.
The question naturally arises as to why to go to the trouble of using exponentials rather than just using the actual raw output data. One obvious problem occurs when the raw data contains negative numbers and the denominator may be zero. Using a ratio of raw data absolute values could be done, but such a function is not differentiable.
Keisan Online Softmax Calculator. Available at this website.
Softmax Function. Wikipedia. The Free Encyclopedia.
What is a convolutional neural network (CNN) and how does it work?