research
overview of my research and selected projects
NEURO-SYMBOLIC AI
Learning reward functions and control policies that satisfy temporal-logic specifications
Designing dense or “informative” reward functions for Reinforcement Learning (RL) is a highly non-trivial task. Errors in reward design can lead to unsafe and undesirable learned control behaviors. This work introduces a neurosymbolic learning-from-demonstrations (LfD) framework that uses high-level tasks expressed in Signal Temporal Logic (STL), and user demonstrations to extract reward functions and control policies via reinforcement learning. The LfD-STL framework enables an agent to learn non-Markovian/temporal rewards and overcome critical issues (safety and performance) with inverse reinforcement learning methods. The initial development of the framework was applied to discrete and deterministic environments, and was later generalized to continuous spaces and stochastic environments via Gaussian Processes and neural-network modeling.
Learning to improve/extrapolate beyond demonstrator performance
Generally, a machine learning model’s performance is determined by the quality and amount of data it is trained on. Thus, noisy data and limited human demonstrations, which is widely observed in robotic settings, poses a challenge to learn optimal behaviors. This work on neuro-symbolic apprenticeship learning implements temporal logic-guided reinforcement learning from demonstrations to automatically improve robot safety and performance via self-monitoring and adaptation. The capabilities of the framework are exhibited on a variety of mobile navigation, fixed-base manipulation and mobile-manipulation tasks using the Nvidia Isaac simulator. This paper has now been accepted at IROS 2024, and will be presented in Abu Dhabi this October. Additional details can be found on the supplemental document.
References
- Puranic, A. G., Deshmukh, J. V., & Nikolaidis, S. (2024). Signal Temporal Logic-Guided Apprenticeship Learning. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
- Puranic, A. G., Deshmukh, J. V., & Nikolaidis, S. (2021). Learning From Demonstrations Using Signal Temporal Logic in Stochastic and Continuous Domains. IEEE Robotics and Automation Letters (RA-L). Presented at IROS, 6(4), 6250–6257. https://doi.org/10.1109/LRA.2021.3092676
- Puranic, A., Deshmukh, J., & Nikolaidis, S. (2021). Learning from Demonstrations using Signal Temporal Logic. Proceedings of the 2020 Conference on Robot Learning (CoRL), 155, 2228–2242. https://proceedings.mlr.press/v155/puranic21a.html
INTERPRETABLE/EXPLAINABLE AI (xAI)
Generating explainable temporal logic graphs from human data
Understanding and evaluating the human demonstrations and learned robot behaviors plays a critical role in optimizing the control policies for robots, without which, a robot may infer incorrect reward functions that lead to undesirable or unsafe control policies. The prior LfD-STL required the demonstrators to explicitly specify their preferences by ranking the STL specifications. The ranked specifications were represented by a directed acyclic graph (DAG) to capture the preferences and dependencies. To relax this manual burden, we automatically infer the specification DAG from demonstrations via our novel Performance Graph Learning (PeGLearn). PeGLearn facilitates explainability for AI-based systems via a user study on CARLA, a simulated driving environment. We also integrate human feedback (annotations) in a robot-assisted surgical domain to model behaviors of surgeons according to their expertise. Additional details can be found on the supplemental document.
Learning (mining) specifications from temporal data
Autonomous cyber-physical systems such as self-driving cars, unmanned aerial vehicles, general purpose robots, and medical devices can often be modeled as a system consisting of heterogeneous components. Understanding the high-level behavior of such components, especially equipped with deep learning, at an abstract, behavioral level is thus a significant challenge. Our work seeks to answer: Given a requirement on the system output behaviors, what are the assumptions on the model environment, i.e., inputs to the model, that guarantee that the corresponding output traces satisfy the output requirement? We develop techniques involving decision-tree classifiers, counterexample-guided learning, optimization, enumeration and parameter mining to extract STL specifications that explain system behaviors.
References
- Puranic, A. G., Deshmukh, J. V., & Nikolaidis, S. (2023). Learning Performance Graphs From Demonstrations via Task-Based Evaluations. IEEE Robotics and Automation Letters (RA-L). Oral Presentation at ICRA, 8(1), 336–343. https://doi.org/10.1109/LRA.2022.3226072
- Mohammadinejad, S., Deshmukh, J. V., Puranic, A. G., Vazquez-Chanlatte, M., & Donzé, A. (2020). Interpretable Classification of Time-Series Data Using Efficient Enumerative Techniques. Proceedings of the 23rd International Conference on Hybrid Systems: Computation and Control (HSCC). https://doi.org/10.1145/3365365.3382218
- Mohammadinejad, S., Deshmukh, J. V., & Puranic, A. G. (2020). Mining Environment Assumptions for Cyber-Physical System Models. 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS), 87–97. https://doi.org/10.1109/ICCPS48487.2020.00016
COMPUTER VISION
Evaluating the quality of vision-based perception algorithms
Computer vision is one of the major perception components of a cyber-physical system with numerous applications in autonomous vehicles, industrial/factory robotics, medical devices, etc. Checking the correctness and ensuring robustness of perception algorithms such as those based on deep convolutional neural networks is a major challenge. Conventionally, perception algorithms are tested by comparing their performance to ground truth labels, that require a laborious annotation process. We propose the use of Timed Quality Temporal Logic (TQTL) as a formal language to express desirable spatio-temporal properties of a perception algorithm processing a video, offering an alternative metric that can provide useful information, even in the absence of ground truth labels.
Vision-based metric for evaluating surgeon’s performance
Due to the lack of instrument force feedback during robot-assisted surgery, tissue-handling technique is an important aspect of surgical performance to assess. We develop a vision-based machine learning algorithm for object detection and distance prediction to measure needle entry point deviation in tissue during robotic suturing as a proxy for tissue trauma.
References
- Balakrishnan, A., Puranic, A. G., Qin, X., Dokhanchi, A., Deshmukh, J. V., Ben Amor, H., & Fainekos, G. (2019). Specifying and Evaluating Quality Metrics for Vision-based Perception Systems. 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), 1433–1438. https://doi.org/10.23919/DATE.2019.8715114
- Puranic, A., Chen, J., Nguyen, J., Deshmukh, J., & Hung, A. (2019). MP35-04 Automated Evaluation of Instrument Force Sensitivity During Robotic Suturing Utilizing Vision-Based Machine Learning. Journal of Urology, 201(Supplement 4), e505–e506. https://doi.org/10.1097/01.JU.0000555994.79498.94