research | Aniruddh Puranic

An overview of my research for developing safe and explainable robots that interact with humans.

NEURO-SYMBOLIC AI

Left: Reward modeling and prediction via Gaussian processes and deep neural networks. Right: Extracted specification-consistent behaviors in simulations, including Nvidia Isaac.

Modeling and control synthesis for long-horizon tasks using Temporal Behavior Trees

This work addresses the limitations of long-horizon, myopic reinforcement learning by combining the adaptability of Behavior Trees (BTs) with the formal rigor of temporal logics. While temporal logics enable precise task specification, they struggle with complex dependencies. BTs offer modularity and flexibility but lack formal guarantees. By translating BTs into Timed Automata, this approach enables formal verification, inconsistency detection, and control synthesis using UPPAAL, ensuring robust, time-aware task execution (Matheu et al., 2025); (Matheu et al., 2025).

Learning reward functions and control policies that satisfy temporal-logic specifications

Designing effective reward functions in reinforcement learning (RL) is challenging and error-prone, often leading to unsafe behaviors. This work presents a neurosymbolic learning-from-demonstrations (LfD) framework that leverages Signal Temporal Logic (STL) and user demonstrations to derive temporal reward functions and control policies. Unlike traditional inverse RL, this approach captures non-Markovian rewards and enhances safety and performance (Puranic et al., 2021). Initially developed for discrete environments, the framework was later extended to continuous and stochastic settings using Gaussian Processes and neural networks (Puranic et al., 2021); (Puranic et al., 2023).

Learning to improve/extrapolate beyond demonstrator performance

Machine learning performance relies heavily on data quality and quantity, making noisy or limited demonstrations a challenge in robotics. This work proposes neuro-symbolic apprenticeship learning, using temporal logic-guided reinforcement learning to enable robots to self-monitor and adapt for improved safety and performance. It integrates logic-enforced curriculum learning for long-horizon tasks. The capabilities of the framework are exhibited on a variety of mobile navigation, fixed-base manipulation and mobile-manipulation tasks using the Nvidia Isaac simulator (Puranic et al., 2023); (Puranic et al., 2024). We further introduce a new algorithm that uses Conditional Value-at-Risk (CVaR) within a quantile-based RL framework to mitigate overestimation bias and improve safety (Enwerem et al., 2025).

INTERPRETABLE/EXPLAINABLE AI (xAI)

Left: Generating graphs that explain demonstrator performance and formal specification conflicts. Center: Neural reward modeling from inferred graphs. Right: Mining formal specifications from time-series data.

Generating explainable temporal logic graphs from human data

Evaluating human demonstrations is key to learning safe, effective robot policies. Prior LfD-STL methods required manual ranking of STL specifications, represented as a DAG. To reduce this burden, we introduce Performance Graph Learning (PeGLearn), which automatically infers the DAG from demonstrations. PeGLearn enhances explainability, validated via a CARLA driving study, and models surgeon behavior using human feedback in a surgical domain.Additional details can be found on the supplemental document.

Learning (mining) specifications from temporal data

Autonomous cyber-physical systems such as self-driving cars, unmanned aerial vehicles, general purpose robots, and medical devices can often be modeled as a system consisting of heterogeneous components. Understanding the high-level behavior of such components, especially equipped with deep learning, at an abstract, behavioral level is thus a significant challenge. Our work seeks to answer: Given a requirement on the system output behaviors, what are the assumptions on the model environment, i.e., inputs to the model, that guarantee that the corresponding output traces satisfy the output requirement? We develop techniques involving decision-tree classifiers, counterexample-guided learning, optimization, enumeration and parameter mining to extract STL specifications that explain system behaviors (Puranic et al., 2023); (Mohammadinejad et al., 2020); (Mohammadinejad et al., 2020).

COMPUTER VISION

Evaluating the quality of vision-based perception algorithms

Computer vision is vital for cyber-physical systems, but ensuring the robustness of deep learning-based perception is challenging. Traditional testing relies on ground truth labels, which are labor-intensive to obtain. This work introduces Timed Quality Temporal Logic (TQTL) to formally specify spatio-temporal properties of perception algorithms, enabling evaluation without ground truth (Balakrishnan et al., 2019).

Vision-based metric for evaluating surgeon’s performance

Due to the lack of instrument force feedback during robot-assisted surgery, tissue-handling technique is an important aspect of surgical performance to assess. We develop a vision-based machine learning algorithm for object detection and distance prediction to measure needle entry point deviation in tissue during robotic suturing as a proxy for tissue trauma (Puranic et al., 2019).