Deep learning is hot right now. Applications such as voice recognition, facial recognition, language translation, medical diagnostics, self-driving vehicles, and even the detection of credit fraud, are becoming more and more woven into the fabric of modern life. Because of such successes, and the opportunities they open up for further extensions of the technology, deep learning is currently one of the most active fields in computer science research, and progress has been rapid. In this article we’ll take a brief look at several of the latest trends in deep learning research.
Autonomous Vehicle Systems
Perhaps the area of deep learning research that has received the most public notice in recent years relates to the advent of driverless cars and trucks. A number of companies are planning to have autonomous vehicles in widespread operation within the next several years. In fact, according to Reuters, General Motors ”plans to deploy thousands of self-driving electric cars in test fleets in partnership with ride-sharing affiliate Lyft Inc, beginning in 2018.” So, research into all facets of what it takes to equip vehicles to safely navigate on their own is of critical importance.
One such research project is being carried out by researchers at the Waterloo Artificial Intelligence Institute of the University of Waterloo in Canada. It involves an issue that is of primary importance if autonomous vehicles are to be able to operate on their own in an environment where they must share the road with human drivers.
In a paper entitled, ”MicronNet: A Highly Compact Deep Convolutional Neural Network Architecture for Real-time Embedded Traffic Sign Classification,” the research team reports on their efforts to improve the ability of neural networks to recognize and act on the traffic signs autonomous vehicles will encounter on the road.
Current neural net technology can already perform the traffic sign recognition task at a level of accuracy that rivals what humans achieve. The problem, however, is that the task is so compute-intensive that designing practical implementations that can be embedded in vehicles and perform recognition in real time is difficult. The aim of the MicronNet research project is to develop a deep learning architecture that has both a high degree of compactness and the computational speed required for embedding in practical driverless vehicles.
MicronNet is described as ”a highly compact deep convolutional neural network designed specifically for real-time embedded traffic sign recognition.” It is designed to reduce the number of real-time computations required to the fewest possible while still achieving the highest levels of sign recognition. The key to success in this endeavor is in being able to optimize the microarchitecture of each layer of a convolutional neural network so as to limit the number of parameters required for computations. The researchers were able to achieve a design that resulted in 27x fewer parameters than state-of-the-art models, while still achieving human-level accuracy.
The current research project was conducted using the German traffic sign recognition benchmark (GTSRB), and is thus directly applicable to a limited set of traffic scenarios. Future research will aim at generalizing the model so that it can achieve similar results across a wide range of traffic environments.
Machine Vision
A very important capability for not only autonomous and semi-autonomous vehicles, but also in such areas as industrial robotics, is the ability for neural nets to understand what humans in the vicinity are doing. For example, one research project being carried out at MIT is based on the principle that even in autonomous vehicles, the human-machine interaction remains critical. In his paper ”Human-Centered Autonomous Vehicle Systems: Principles of Effective Shared Autonomy,” MIT research scientist Dr. Lex Fridman questions the assumption that autonomous vehicles can safely operate entirely on their own in an environment shared with human drivers. He contends that the capabilities of AI systems at their best are insufficient for that task, and that humans must therefore be kept in the loop. ”The human-machine team must jointly maintain sufficient situation awareness to maintain control of the vehicle,” he says.
That being the case, not only must the vehicle be aware of what’s happening in its surrounding environment, but it is also critically important that it understand what Dr. Fridman calls the ”driver state.” That is, the vehicle must be able to detect ”driver glance region, cognitive load, activity, hand and body position.”
Sensing driver state may be facilitated by another recent research project that aims at enabling machine vision systems to track the gaze of humans with whom they interact. In a paper called, ”Light-weight Head Pose Invariant Gaze Tracking,” a team of researchers from the University of Maryland and NVIDIA detail their work developing algorithms that allow convolutional neural networks to reliably and accurately determine where a human is looking, even with wide differences in head pose, individual physical characteristics, illumination, and image quality. Major goals of the research arise out of the fact that in many applications the gaze tracking technology must be implemented using low-cost processors, and yet must produce gaze estimations very quickly in order to operate in real time.
Noting that unconstrained gaze tracking remains a very difficult problem, the researchers report that their project has so far succeeded in improving the robustness (consistent accuracy under less-than-ideal circumstances) of gaze classifiers, as well as increasing estimation speed by a factor of ten, without increasing costs.
Deep Learning System Security
With deep learning now being used in a growing number of mission-critical application areas, researchers are increasingly concerned to uncover and fix the security vulnerabilities inherent in the deep learning model before they can be exploited by unscrupulous actors. According to researchers at the University of California, Berkeley, even state-of-the-art deep learning systems can be ”easily fooled” by an attacker.
Mapping out the various methods by which the security of deep learning systems can be compromised, with the ultimate goal of finding ways of eliminating such vulnerabilities, is the province of an entire area of advanced research called ”adversarial machine learning.”
One quite effective means of compromising modern deep learning systems is through backdoor attacks using data poisoning. A backdoor allows an adversary to gain entry to a computer system without having to satisfy the authentication regime (such as having to enter a valid password) normally necessary for access.
Neural networks are particularly vulnerable to ”data poisoning” attacks because they require large amounts of training data. In the current state of deep learning technology, an attacker who manages to inject even a small amount of contrived data into the training dataset may be able to effectively take control of the target deep learning system. That’s the conclusion reported by one team of researchers in a paper entitled ”Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning.” They found that even if an adversary has no knowledge concerning the victim system’s neural network configuration and training dataset, by inserting as little as 50 well-selected samples into the training data the attacker can achieve a success rate above 90 percent.
In fact, a research team at the University of Maryland notes that a face recognition neural network might be compromised by even a single tainted image. In their paper, ”Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks,” the researchers detail an attack mode in which such images are simply placed online in locations where the developers of the target system, needing vast amounts of visual training data, might find them and add them to their training dataset.
At this point most research projects in the adversarial machine learning field seem aimed at developing a broad understanding of the various attack modes to which neural networks may be subjected. This, in turn, will enable heightened awareness and vigilance in the present, while facilitating the ongoing development of a comprehensive suite of effective counter measures.
Seeing Into the Neural Network Black Box
One of the most critical areas of deep learning research today involves what is called the ”black box” problem. The term ”black box” refers to a system in which only the inputs and outputs can be seen. What happens inside the system that causes a particular output to result from some specific sequence or combination of inputs cannot be directly observed.
The neural networks on which deep learning depends are by nature black box systems. They consist of a number (often thousands or millions) of software neurons arranged into several distinct layers, including an input layer, an output layer, and one or more (up to thousands) of ”hidden” layers in between. It is the interconnections between these neuronal layers, as modified by weighting factors applied at the inputs of each layer, that allow the system to extract and make predictions based on the complex features or patterns present in its input data.
The neural network black box problem arises because, although the weights applied to the hidden layers of the system can be examined, knowing them tells a human observer very little about the factors on which the system’s predictions or recommendations are based. For example, a neural network may be trained to determine whether a visual scene represents a dog or a cat, and may do so quite accurately. But there’s no way directly accessible to human understanding to know why the system reached the conclusion it did.
In many use cases that’s not a problem. But when deep learning systems are used, as they often are today, to make recommendations to a judge as to whether accused individuals should receive bail, or to a bank concerning whether an application for credit should be approved, it is of vital importance that the reasoning behind such recommendations be articulable. In fact, in many application areas, such as healthcare, that type of transparency is now a legal requirement.
There are currently a number of research projects aimed at solving the black box problem. Of particular note is a project being carried out by Google information scientists. In a paper entitled ”Axiomatic Attribution for Deep Networks” the researchers identify two fundamental axioms, Sensitivity and Implementation Invariance, that any effective attribution method must satisfy. They then describe a new attribution method called Integrated Gradients that meets that requirement.
As yet there is no generally applicable practical solution to the deep learning black box problem. But, as the Google effort demonstrates, it’s high on the agendas of a number of highly qualified researchers. When it is solved, a significant restraint on the expansion of deep learning into important areas of modern life will be removed.