Researchers at the UC Irvine’s AI Lab and researchers from Microsoft Research used deep Learning to teach intelligent industrial robot systems to learn how to pick up tools and assemble pieces in different shapes. The approach uses novel training algorithms to create highly-coordinated tasks for autonomous agents. When applied to industrial environments where objects are frequently stacked or stacked over each other, multi-agent reinforcement learning can accomplish tasks by rewarding and punishing specific actions or behaviors.
The research also demonstrates a technique that improves learning by combining unsupervised data with supervised data, allowing more efficient prediction of human behavior.
The team was awarded a $1.65
million grant this week by the National Science Foundation (NSF) to develop
technologies that enhance robotics.
This is an abstract version from an oral presentation at NIPS 2017.
The paper starts with three concepts. First, it describes the use of “task-specific reward functions” to improve task performance, then describe a system called Multi-Task Augmentation Networks (MTANs). This technology is based on an architecture we call Task-Specific Reward Function Network (TSReF), which allows learning reward distributions within a given environment. Thirdly, it introduces Reinforcement Learning, as a means to discover new ways to tackle complex problems such as what is needed for humanoid robots and how it should be used in enterprise automation applications.
A key advantage for automating complicated processes such as assembling requires a combination of task-specific actions, some knowledge about the shape being assembled, and information needed to figure out where to place certain components.
Multi-Task Augmentation Network (MTAN) training uses the general framework we call DeepMind Vision and then adaptively learns additional parameters. MTAN learns on how to select optimal action functions and the model performs better than traditional approaches based on simple rules. An agent also has access to rewards at various places, so it needs to know which actions will lead to these rewards, regardless of whether those incentives come from its own interests or another agent’s. It even gets better with more tasks — not only knowing when it will receive more rewards, but also knowing exactly how much.
The result is highly coordinated tasks in multi-task environment, with no single point of failure and no single way an agent reaches its goal. For instance, in video processing, the camera controls the object around the camera lens, and the object itself is moving across the screen. That coordination makes the system very flexible and stable: in most cases there is no need to update once one part is misplaced. What’s more, agents can interact with multiple objects simultaneously, without having to worry that any part gets jammed into another part and ends up damaging them all.
The paper describes four phases of the project: i) building the foundation, ii) developing the models, iii) training it, and iv) measuring the final results.
Building The Foundation

The basic idea is to create a library of tasks to apply in different scenarios. Most importantly, it shows off real world examples for easy evaluation.
In the first phase, the foundation of multi-task learning can be built by creating a set of tasks that have already been solved by humans. We’ve done this by combining existing neural nets with image recognition models. In their studies, Duy and Zhou used CNNs to recognize hand movements. Then, they combined this with pre-trained networks to build computer vision models. Nowadays several companies have created neural net products. These models provide a low-level platform for constructing neural networks. But what about understanding the underlying logic?
One of the most important parts of modeling neural nets is finding what is learned. With our model, we can discover what kinds of transformations are necessary to perform a series of tasks. From an input image, we can find how many edges are connected to the corners.
Another example concerns visual planning. Imagine you want to construct your house in just two days. You know where the roof is, but you don’t know how to do it yet. So you start by making sure your window sills are right-angled and that you put down planks underneath them. As you progress, you start thinking of things like which window to draw first and if you need to change the location of windows. Eventually, you’ve figured everything out, and as soon as you need extra space, you’re ready.
And that’s what multi-task training is.
Modeling Modeling Models
The second stage of our work involves designing the mathematical structure or the representation of tasks as a computational graph, a graph of mathematical operations. After this exercise, it’s not hard to see why people consider ML approaches as extremely useful. They help us find the exact solutions in time and at scale. They also make it easier to design and modify models so we’re no longer limited by traditional constraints of memory and computation. We’ve discussed three types of architectures in previous blog posts. And now we’ll focus on neural nets.
One of the biggest advantages of
neural networks is flexibility. Many companies with large amounts of data
(Google, Facebook, Amazon) don’t restrict themselves to training one model
because it doesn’t fit well with their data, but rather have thousands of
models that are very effective. Neural networks are really good for automating
complex tasks. Anecdotally, however, I always end up frustrated by how many
times neural nets get stuck in local minima and end up crashing out on me for
no discernible reason — usually due to the wrong initialization, mismodeling,
etc. One way to avoid this is to train multiple networks instead of one.
The motivation for choosing both
is to achieve higher learning accuracy but also to make a stronger assumption
on a network. Once again, an agent must act within a given environment. If each
agent cannot observe the whole state of the environment, it needs to be able to
select actions that will lead to rewards. At a deeper level, this is akin to
the ability to choose among different paths in life. There’s just one problem.
Each path exists within a certain area of the world. As long as the environment
changes, so does the distribution of rewards, each agent might require another
action.
To overcome this challenge, we’ve introduced a modified definition for a reward distribution, the Q-function of states: It considers the current situation and finds out a solution that maximizes expected rewards.
If an agent has multiple goals, each with its individual rewards, then the Q-function becomes significantly harder to solve. Instead of considering every possible scenario, a simpler model to model those rewards is required — and we’ve got it! We need a better model!
Modeling neural net architecture has remained one of my favorite areas of study in recent years. Here, we can construct neural nets that are easy to understand, learn by trial-and-error, and get better with practice. A lot of interesting stuff comes in this field, including the application of machine learning techniques in health care, drug discovery, consumer tech, energy production, social media, natural language processing, medical diagnoses, traffic flow management, etc. The core concept behind the present study is the importance of considering an agent as an entity. Just like a person could be called a person, a tool is a tool. Our mission is to let agents understand that every day, and build powerful software applications that they can rely on.
So far, most practical
demonstrations seem to involve a large number of agents working cooperatively
on a common objective in order to maximize something. An advantage of
multi-task training is that it enables an agent with separate goals to
cooperate with others and increase efficiency.
Modeling Agents Training Actions, Rewards
.jpeg)
When applying neural net techniques to work, we first define an agent, which represents what the program is trying to achieve. Next, we define task spaces for achieving that goal. Finally, we construct policies that represent how the agent decides how to respond to different situations.
We show results in a graphical form. The Q-function is modeled linearly as a Gaussian distribution. Thus, the policy can be expressed by a high dimensional, continuous function, and the reward is modeled as the mean value of action returns.
The top two layers of a multi-task agent model are neural nets. The rest of the model, represented by non-neural layers, has several layers of convolutional processes. All these layers add information and enable the agents to act efficiently and effectively with respect to their task in the environment. Below we show a visualization of a particular step, where we identify an appropriate reward. The green rectangle contains the desired reward, while the yellow triangle represents the reward we are looking for. As mentioned earlier, in this experiment, the agent is trained as a quadcopter, meaning it moves along straight lines. During the testing, we show reward sequences on the left, and during training the agent can decide which action to take.
We use an attention mechanism, which helps the model to assign relevant weights to relevant features. We can then use RPOAN to tune the weightings of the output of the non-neural network while leaving a small portion for other features.
Finally, with the model trained,
we apply the RL algorithm to find optimal reward values for the training task.
Experiments
We evaluated our model on 3
different tasks. Firstly, we tested it with manual assembly tasks. Secondly, we
tested with simultaneous arm movements. Lastly, we also tested it with
warehouse floor stacking.
We evaluated 10 agents for each
case, and measured their average reward and error rate over time
0 Comments