overview of my research and selected projects

An overview of my research for developing safe and explainable robots that interact with humans.


Left: Reward modeling and prediction via Gaussian processes and deep neural networks. Right: Extracted specification-consistent behaviors in simulations, including Nvidia Isaac.
Learning reward functions and control policies that satisfy temporal-logic specifications

Designing dense or “informative” reward functions for Reinforcement Learning (RL) is a highly non-trivial task. Errors in reward design can lead to unsafe and undesirable learned control behaviors. This work introduces a neurosymbolic learning-from-demonstrations (LfD) framework that uses high-level tasks expressed in Signal Temporal Logic (STL), and user demonstrations to extract reward functions and control policies via reinforcement learning. The LfD-STL framework enables an agent to learn non-Markovian/temporal rewards and overcome critical issues (safety and performance) with inverse reinforcement learning methods. The initial development of the framework was applied to discrete and deterministic environments, and was later generalized to continuous spaces and stochastic environments via Gaussian Processes and neural-network modeling.

Learning to improve/extrapolate beyond demonstrator performance

Generally, a machine learning model’s performance is determined by the quality and amount of data it is trained on. Thus, noisy data and limited human demonstrations, which is widely observed in robotic settings, poses a challenge to learn optimal behaviors. This work on neuro-symbolic apprenticeship learning implements temporal logic-guided reinforcement learning from demonstrations to automatically improve robot safety and performance via self-monitoring and adaptation. The capabilities of the framework are exhibited on a variety of mobile navigation, fixed-base manipulation and mobile-manipulation tasks using the Nvidia Isaac simulator. This paper has now been accepted at IROS 2024, and will be presented in Abu Dhabi this October. Additional details can be found on the supplemental document.

  • Puranic, A. G., Deshmukh, J. V., & Nikolaidis, S. (2024). Signal Temporal Logic-Guided Apprenticeship Learning. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
  • Puranic, A. G., Deshmukh, J. V., & Nikolaidis, S. (2021). Learning From Demonstrations Using Signal Temporal Logic in Stochastic and Continuous Domains. IEEE Robotics and Automation Letters (RA-L). Presented at IROS, 6(4), 6250–6257.
  • Puranic, A., Deshmukh, J., & Nikolaidis, S. (2021). Learning from Demonstrations using Signal Temporal Logic. Proceedings of the 2020 Conference on Robot Learning (CoRL), 155, 2228–2242.


Left: Generating graphs that explain demonstrator performance and formal specification conflicts. Center: Neural reward modeling from inferred graphs. Right: Mining formal specifications from time-series data.
Generating explainable temporal logic graphs from human data

Understanding and evaluating the human demonstrations and learned robot behaviors plays a critical role in optimizing the control policies for robots, without which, a robot may infer incorrect reward functions that lead to undesirable or unsafe control policies. The prior LfD-STL required the demonstrators to explicitly specify their preferences by ranking the STL specifications. The ranked specifications were represented by a directed acyclic graph (DAG) to capture the preferences and dependencies. To relax this manual burden, we automatically infer the specification DAG from demonstrations via our novel Performance Graph Learning (PeGLearn). PeGLearn facilitates explainability for AI-based systems via a user study on CARLA, a simulated driving environment. We also integrate human feedback (annotations) in a robot-assisted surgical domain to model behaviors of surgeons according to their expertise. Additional details can be found on the supplemental document.

Learning (mining) specifications from temporal data

Autonomous cyber-physical systems such as self-driving cars, unmanned aerial vehicles, general purpose robots, and medical devices can often be modeled as a system consisting of heterogeneous components. Understanding the high-level behavior of such components, especially equipped with deep learning, at an abstract, behavioral level is thus a significant challenge. Our work seeks to answer: Given a requirement on the system output behaviors, what are the assumptions on the model environment, i.e., inputs to the model, that guarantee that the corresponding output traces satisfy the output requirement? We develop techniques involving decision-tree classifiers, counterexample-guided learning, optimization, enumeration and parameter mining to extract STL specifications that explain system behaviors.

  • Puranic, A. G., Deshmukh, J. V., & Nikolaidis, S. (2023). Learning Performance Graphs From Demonstrations via Task-Based Evaluations. IEEE Robotics and Automation Letters (RA-L). Oral Presentation at ICRA, 8(1), 336–343.
  • Mohammadinejad, S., Deshmukh, J. V., Puranic, A. G., Vazquez-Chanlatte, M., & Donzé, A. (2020). Interpretable Classification of Time-Series Data Using Efficient Enumerative Techniques. Proceedings of the 23rd International Conference on Hybrid Systems: Computation and Control (HSCC).
  • Mohammadinejad, S., Deshmukh, J. V., & Puranic, A. G. (2020). Mining Environment Assumptions for Cyber-Physical System Models. 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS), 87–97.


Left: Generating graphs that explain demonstrator performance and formal specification conflicts. Center: Neural reward modeling from inferred graphs. Right: Mining formal specifications from time-series data.
Evaluating the quality of vision-based perception algorithms

Computer vision is one of the major perception components of a cyber-physical system with numerous applications in autonomous vehicles, industrial/factory robotics, medical devices, etc. Checking the correctness and ensuring robustness of perception algorithms such as those based on deep convolutional neural networks is a major challenge. Conventionally, perception algorithms are tested by comparing their performance to ground truth labels, that require a laborious annotation process. We propose the use of Timed Quality Temporal Logic (TQTL) as a formal language to express desirable spatio-temporal properties of a perception algorithm processing a video, offering an alternative metric that can provide useful information, even in the absence of ground truth labels.

Vision-based metric for evaluating surgeon’s performance

Due to the lack of instrument force feedback during robot-assisted surgery, tissue-handling technique is an important aspect of surgical performance to assess. We develop a vision-based machine learning algorithm for object detection and distance prediction to measure needle entry point deviation in tissue during robotic suturing as a proxy for tissue trauma.

  • Balakrishnan, A., Puranic, A. G., Qin, X., Dokhanchi, A., Deshmukh, J. V., Ben Amor, H., & Fainekos, G. (2019). Specifying and Evaluating Quality Metrics for Vision-based Perception Systems. 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), 1433–1438.
  • Puranic, A., Chen, J., Nguyen, J., Deshmukh, J., & Hung, A. (2019). MP35-04 Automated Evaluation of Instrument Force Sensitivity During Robotic Suturing Utilizing Vision-Based Machine Learning. Journal of Urology, 201(Supplement 4), e505–e506.