Jorge Guerra Pires
A crash view on Artificial Neural Networks: from perceptrons to deep learning
Updated: Oct 8, 2022
Neural Networks (NNs), or Artificial Neural Networks (ANNs), started as a big promising, and their models were quite simple compared to the models we have today: it was a simple neuron with binary outputs based on thresholds.
From one side, we had some people from neuroscience seeing on the models possible explanations for their biological phenomena (i.e., in silico simulations); on the other, applied mathematical and computer scientists looking for new solutions for problems out of the box (e.g., XOR problem).
As the story goes, a mathematician proved the limitation of the models; we waiting for another one to dive into the complexity of deep learning and make this game again, even for simpler models such as simulated annealing, they are facing problems to prove convergence, even though numerically we know it converges to the best solution. However, what really limited the application of NNs was the fact that we could not train neurons in layers, until the backpropagation algorithm, which brought NNs to the spotlight (it was until 1980s, almost thirty years of silence and lost-promises).
Problems of data segmentation require quite complex boundaries definitions[1]. However, another problem came up: we still could not train several hidden layers, not to mention the NNs forget the knowledge already acquired when presented with new training sections: imagine, whenever you learn a new discipline in college, you forget the previous one completely, you would never finish college!
The problem of training several layers was solved with algorithms used in deep learning, and the problem of forgetting the previously trained network was also solved by techniques such as transfer learning, an initial proposal was Adaptive Resonance Theory (ART). Thus, deep learning is the conjunction of several problems of NNs solved along the way. We still have the old issue of not being able to make sense of how NNs learns, or how to make sure it will not fail, it did not stop the applied mathematician and computer scientists to use them in several contexts, mainly on video and image processing; e.g., computer vision and image identification. We also have the problem of NNs not being able to consider complex scenarios, such as human-related ones; as Daniel Kahneman likes to say, even though those models may guess that if someone breaks a leg, he will not to go to a movie tonight as consequence, the model has no idea why is that so. It can be called the “broken leg paradox”.
Comments