Integrating planning and interpretable goal recognition for autonomous driving

Author: Cillian Brewitt

Date: 2020-05-12

An important problem in autonomous driving is working out the goals and intentions of other road users, and predicting what they will do next. Take the example shown in Figure 1. The blue vehicle is our vehicle, which needs to enter the the junction and turn left to reach the goal G1. Vehicle 2 has right of way over our vehicle, which at a first glance prevents us from entering the junction. However, by observing the oncoming vehicle 3, and the fact that vehicle 2 has stopped to give way to this vehicle, we can recognise that vehicle 2 is trying to each the goal G2. Given this knowledge, we can predict that the vehicle will not cross our path and we can safely enter the junction without waiting any longer.

Short text caption for image

Figure 1: Our vehicle (blue) intends to reach goal G1

There are several factors which make predicting a vehicle's trajectory a difficult problem. The predictions must be both accurate and fast to compute, based on limited amounts of information. In addition to this, the actions that different road users take can be highly coupled and interactive. Ideally, the way in which the prediction method produces the predictions should be interpretable by humans. It is also necessary to have a way of integrating the predictions into planning and decision making.

Many current prediction methods fail to solve all of these issues, or have other shortcomings. Some methods use deep neural networks to generate possible future trajectories for vehicles [2-7], however these methods are uninterpretable, require large amounts of data to train, and can be computationally expensive. Other methods simplify the problem by assuming the vehicles are executing a maneuver from a known set of maneuvers such as changing lane or turning at a junction [8-12]. These methods then try to identify the vehicle’s current maneuver. However, this approach can only lead to predictions over a very short time period, until the end of the current maneuver. Such predictions would not be so useful in a situation such as the one given in Figure 1.

Our method: Integrating Goal Recognition with Monte Carlo Tree Search

The method proposed in our recent paper [1] overcomes some of these problems. An overview of the method is shown in Figure 2. As part of our prediction method we use goal recognition by inverse planning. This process is highly interpretable and can give longer term predictions than just the current maneuver. Our method infers a probability distribution over goals and a distribution over trajectories for each of these goals. We have integrated these predictions with a Monte Carlo Tree Search (MCTS) planner, which computes an appropriate sequence of maneuvers for our vehicle given the predictions. For both prediction and planning, we use a set of macro actions or predefined maneuver sequences to make the problem more tractable. Some examples of basic maneuvers are lane-follow, change lane left/right, or give way, and examples of macro actions are continue to the lane end, or exit left/right at an upcoming junction.

Figure 2: Overview of our method

When making predictions for a vehicle, we first use a classifier to obtain a probability distribution over the possible current maneuvers for the vehicle, and predict the position where the vehicle would be once the maneuver has been completed. For each of these positions, we then generate a set of possible goals based on the road layout, with the goal likelihoods obtained through inverse planning. For each goal, the inverse planning process involves using A* search to find the optimal plan from the vehicle's current position, and the optimal plan from the position where we first observed the vehicle. The goal likelihood is defined in terms of the cost difference between these two plans – a smaller cost difference leads to a higher likelihood. Through this mechanism, if the vehicle's actions are similar to the optimal actions needed to reach the goal, then the goal likelihood should be high. Once we have the goal likelihoods, we can then use these to calculate Bayesian posterior probabilities for the set of goals. We then generate a set of trajectories for each goal, with the likelihood of each trajectory based on its cost. Inverse planning is based on the assumption that the agents are "approximately rational", where plans are optimal by a given metric. This allows us to form an intuitive interpretation of why the system gives certain predictions.

We use an MCTS planner to create a plan for our vehicle. During each planning step, MCTS runs many simulations of the current traffic scenario, and iteratively builds up a tree of possible actions, along with the value of each action. We use our predictions to simulate the behaviour of other vehicles during MCTS planning. During each MCTS iteration, we sample a current maneuver, goal, and trajectory for each vehicle from the distributions that we have inferred.

Simulated Scenarios

We tested our method across four diverse simulated traffic scenarios, shown below. The coloured dots represent goals, and the corresponding coloured bars next to each vehicle show the inferred goal likelihoods. In Scenario 1, a vehicle cuts in front of our vehicle, and because of this, we can infer that the vehicle is going to slow down and turn at the junction. To avoid the need to slow down too, our vehicle switches lane. In Scenario 2, our vehicle aims to enter the junction and turn left. The vehicle coming from the west has stopped at the junction. If this vehicle’s goal was to go straight on or turn left it would not need to stop. However, if its goal is to turn right, it must stop to give way to the vehicle coming from the east. Because of this, we can infer that this vehicle will stay stopped long enough for our vehicle to safely enter the junction.

Scenario 1

Scenario 2

In Scenario 3, our vehicle aims to reach the green goal at the east exit of the roundabout. After we observe that the other vehicle has switched to the outer lane of the roundabout, we can infer that its goal is to take the southern exit at the roundabout, so it is safe for our vehicle to enter the roundabout. In Scenario 4, our vehicle aims to turn right at the junction, but must merge into traffic to do so. As the vehicle from the south is first coming to a stop, it is given a higher likelihood to turn to the right, as to do so it would need to give was for the oncoming vehicle. However, as the vehicle remains stopped for a longer period of time, the likelihood that its goal is to let our vehicle merge rises, until our vehicle recognises that is is safe to merge.

Scenario 3

Scenario 4

Results

We compared our method to several baseline methods, with the resulting average time for our vehicle to reach its goal when using each method and scenario shown in Figure 3. One of these methods took the MAP estimate for goal probabilities. During MCTS simulations, the MAP method used the most likely goal and trajectory for each vehicle, rather than sampling from the distribution of goals like the full method did. For another baseline we modified our method to assume that other vehicles will always continue at a constant velocity (CVel). We also included a “conservative” baseline in which our vehicle always gives way if there is any oncoming traffic. All methods were able to complete the scenarios, but the time taken to do so varied.

Figure 3: Comparison of our method with several baselines

In Scenarios 2 and 3, the conservative method required more time than all other methods, as it could not enter the junction until the oncoming vehicle has passed. The CVel method is still able to enter the junctions in these scenarios as the the stopped vehicles are predicted to remain permanently stopped. In Scenario 3, however, both the CVel and conservative methods could not enter the roundabout until the other vehicle has exited, and so they have a higher time to goal than the other methods. Another downside of the CVel and conservative methods is that they are unable to explain the observed behaviours of other vehicles. In Scenario 3, MAP has a lower time to goal than the full method. This is because the MAP method is more certain in its predictions, which allows it to reach the goal quicker on average. This works in our simulated test scenarios, but in real life could lead to dangerous situations.

Figure 4: Inferred probability of the correct goal for vehicles in each scenario

Figure 4 shows how the inferred probability for the correct goal for vehicles in each of our scenarios evolves over time. As we collect a greater number of observations of the vehicle, our certainty that this is the correct goal increases, as the possibility of other goals gets ruled out. In most cases, this approaches a probability of one until the vehicles finally reaches its goal.

Conclusions and Future Work

We have presented a system for autonomous driving which is interpretable, and makes and uses predictions over a long time horizon. We have achieved this by integrating goal recognition by inverse planning with an MCTS planner, while using a set of basic manuevers and macro actions to make the problem tractable.

There are several future directions in which this work could be extended. One is to test our method in an open world with randomly placed vehicles, which would lead to a much more diverse range of driving situations than our four handcrafted scenarios. Another addition to the system could be anomaly detection, where we detect whether a vehicle has a goal or maneuver which we do not recognise. Finally, rather than having fixed predictions for other vehicles, multi-agent MCTS could be used to more accurately model interactions between vehicles.

References

[1] S. V. Albrecht, C. Brewitt, J. Wilhelm, F. Eiras, M. Dobre and S. Ramamoorthy. Integrating Planning and Interpretable Goal Recognition for Autonomous Driving. arXiv preprint arXiv:2002.02277, 2020.

[2] E. Galceran, A. Cunningham, R. Eustice, and E. Olson. Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment. Autonomous Robots, 41(6):1367–1382, 2017.

[3] C. Dong, J. M. Dolan, and B. Litkouhi. Smooth behavioral estimation for ramp merging control in autonomous driving. In IEEE Intelligent Vehicles Symposium, pages 1692–1697. IEEE, 2018.

[4] C. Hubmann, M. Becker, D. Althoff, D. Lenz, and C. Stiller. Decision making for autonomous driving considering interaction and uncertain prediction of surrounding vehicles. In IEEE Intelligent Vehicles Symposium (IV), pages 1671–1678. IEEE, 2017.

[5] C. Hubmann, J. Schulz, M. Becker, D. Althoff, and C. Stiller. Automated driving in uncertain environments: Planning with interaction and uncertain maneuver prediction. IEEE Transactions on Intelligent Vehicles, 3(1): 5–17, 2018.

[6] W. Song, G. Xiong, and H. Chen. Intention-aware autonomous driving decision-making in an uncontrolled intersection. Mathematical Problems in Engineering, 2016, 2016.

[7] Y. Xu, T. Zhao, C. Baker, Y. Zhao, and Y. N. Wu. Learning trajectory prediction with continuous inverse optimal control via Langevin sampling of energy-based models. arXiv preprint arXiv:1904.05453, 2019.

[8] N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. Torr, and M. Chandraker. DESIRE: distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 336–345, 2017.

[9] Y. Chai, B. Sappm, M. Bansal, and D. Anguelov. MultiPath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. In Proceedings of the 3rd Conference on Robot Learning, 2019.

[10] Y. Xu, T. Zhao, C. Baker, Y. Zhao, and Y. N. Wu. Learning trajectory prediction with continuous inverse optimal control via Langevin sampling of energy-based models. arXiv preprint arXiv:1904.05453, 2019.

[11] N. Rhinehart, R. McAllister, K. Kitani, and S. Levine. PRECOG: prediction conditioned on goals in visual multiagent settings. In Proceedings of the IEEE International Conference on Computer Vision, pages 2821–2830, 2019.

[12] M. Wulfmeier, D. Z. Wang, and I. Posner. Watch this: Scalable cost-function learning for path planning in urban environments. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2089– 2095. IEEE, 2016.