Promise of spring reveals interesting hybrid variants
Those of us who have been through a few tech cycles have learned to be cautious, so for the second article in this series I thought it might be helpful to examine the state of AI algorithms to answer the question: what’s different this time?
I reached out to leading AI labs for their perspective, including Jürgen Schmidhuber at the Swiss AI Lab IDSIA. Jürgen’s former students include team members at Deep Mind who co-authored a paper recently published by Nature on deep reinforcement learning.
Our Recurrent Neural Networks (RNNs) have revolutionized speech recognition and many other fields, broke all kinds of benchmark records, and are now widely used in industry by Google (Sak et al.), Baidu (Hannun et al.), Microsoft (Fan et al.), IBM (Fernandez et al.), and many others. — Jürgen Schmidhuber
Jürgen recently published an overview on deep learning in neural networks with input from many others, including Yann Lecun, Director of AI Research at Facebook. Lee Gomes recently interviewed Lecun who provided one of the best definitions of applied AI I’ve seen:
It’s very much interplay between intuitive insights, theoretical modeling, practical implementations, empirical studies, and scientific analyses. The insight is creative thinking, the modeling is mathematics, the implementation is engineering and sheer hacking, the empirical study and the analysis are actual science. What I am most fond of are beautiful and simple theoretical ideas that can be translated into something that works. — Yann Lecun at IEEE Spectrum.
Convolutional neural networks (CNNs)
Much of the recent advancement in AI has been due to convolutional neural networks, which can be trained to mimic partial functionality of the human visual cortex. The inability to accurately identify objects is a common problem, which slows productivity, increases risk, and causes accidents worldwide.
CNNs make use of local filtering with various max-pooling techniques and fewer parameters that make NNs easier to train than in a standard multilayer network. The invention and evolution of the nonlinear backpropagation (BP) algorithm through multi-layers, combined with other supervised learning methods, have enabled nascent artificial intelligence systems with the ability to continuously learn.
CNNs are valuable for a wide range of applications such as diagnostics in healthcare, agriculture, supply chain quality control and automated disaster prevention in all sectors. CNNs are also applied in high performance large-vocabulary continuous speech recognition (LVCSR).
Yoshua Bengio is Professor at Université de Montréal and head of the Machine Learning Laboratory (LISA). He is making good progress on a new deep learning book for MIT Press with co-authors Ian Goodfellow and Aaron Courville.
What's different? - More compute power (the most important element) - More labeled data - Better algorithms for supervised learning (the algorithms of 20 years ago—as is don't work that well, but a few small changes discovered in recent years make a huge difference) — Yoshua Bengio
Yoshua and several colleagues recently proposed a novel approach to train thin and deep networks, called FitNets, which introduces ‘hints’ with improved ability to generalize while significantly reducing the computational burden. In an email exchange, he shared insights on thin nets:
The thin deep net idea is a procedure for helping to train thinner and deeper networks. You can see deep nets as a rectangle: what we call depth corresponds to its height (number of layers) and what we called thickness (or its opposite, being thin) is the width of the rectangle (number of neurons per layer). Deeper networks are harder to train but can potentially generalize better, i.e., make better predictions on new examples. Thinner networks are even harder to train, but if you can train them they generalize even better (if not too thin!). — Yoshua Bengio
Long Short-Term Memory (LSTM)
Sepp Hochreiter is head of the Institute of Bioinformatics at the JKU of Linz (photo above), and was Schmidhuber’s first student in 1991. Schmidhuber credits Sepps's work for “formally showing that deep neural networks are hard to train, because they suffer from the now famous problem of vanishing or exploding gradients”.
Exponentially decaying signals—or exploding out of bounds, was as scientists are fond of saying— a ‘non-trivial’ challenge, requiring a series of complex solutions to achieve recent progress.
The advent of Big Data together with advanced and parallel hardware architectures gave these old nets a boost such that they currently revolutionize speech and vision under the brand Deep Learning. In particular the "long short-term memory" (LSTM) network, developed by us 25 years ago, is now one of the most successful speech recognition and language generation methods. — Sepp Hochreiter
Expectations for the near future
We will go beyond mere pattern recognition towards the grand goal of AI, which is more or less: efficient reinforcement learning (RL) in complex, realistic, partially observable environments... I believe it will be possible to greatly scale up such approaches, and build RL robots that really deserve the name. — Jürgen Schmidhuber at INNS.
Unsupervised learning and reinforcement learning remain prizes in the future (and necessary for real progress towards AI, among other things), in spite of intense research activity and promising advances. — Yoshua Bengio via email.
Deep Learning techniques have the potential to advance unsupervised methods like biclustering to improve drug design or detect genetic relationships among population groups. Another trend will be algorithms that store the current context like a working memory. — Sepp Hochreiter via email.
Pioneers in ML and AI deserve a great deal of credit, as do sponsors who funded R&D through long winters. One difference I see today versus previous cycles is that the components in network computing have now created a more sustainable environment for AI, with greater variety of profitable business models that are dependent upon improvement.
In addition, awareness is growing that learning algorithms are a continuous process that rapidly creates more value over time, so organizations have a strong economic incentive to commit resources early or risk disruption.
In the applied world we are faced with many challenges, including security, compliance, markets, talent, and customers. Fortunately, although creating new challenges, emerging AI provides the opportunity to overcome serious problems that cannot be solved otherwise.
Mark Montgomery is the founder and CEO of Kyield, originator of the theorem ‘yield management of knowledge’, and inventor of the patented AI system that serves as the foundation for Kyield: ‘Modular System for Optimizing Knowledge Yield in the Digital Workplace’. He can be reached at firstname.lastname@example.org.
This article was originally published at Computerworld (2/2015)