Summary of Dr. Poria’s Research Statement on Multimodal AI and NLP

Home / Summary of Dr. Poria’s Research Statement on Multimodal AI and NLP

9 May 2023

Online user-generated data is often multimodal, combining visual, audio, and text channels. This type of content is highly valuable to enterprises due to the wealth of information it contains, which can be used to improve user engagement through better recommender systems and AdSense. However, analyzing this data effectively requires integrating information from multiple modalities, presenting significant technical challenges for the development of AI agents. At our research group, we are dedicated to advancing the field of AI by developing cutting-edge techniques designed to solve complex multimodal tasks such as text-to-audio generation, video generation, and emotion recognition. We focus on addressing the challenge of multimodal information fusion by utilizing representation learning and maximizing mutual information. Our research has demonstrated that fusing information from multiple modalities leads to improved performance when compared to unimodal systems. Our open-source code has been widely adopted by both academia and industry, enabling others to build on our work and accelerate progress in this exciting field. Our developed text-to-audio multimodal foundational model, TANGO ( has been downloaded thousands of times showing its wide applications in industries.

Multimodal data can be prone to channel corruption due to intermediate noise, and certain modalities may carry more valuable information than others. Consequently, it is crucial to develop robust systems that can effectively handle such challenges. At our research group, we are dedicated to addressing these issues through the development of robust multimodal machine learning algorithms. Our research leverages state-of-the-art techniques, including modality imputation using optimal transports, input denoising, and the use of modality dropouts to make the backbone network more robust. Our ultimate goal is to create robust multimodal approaches that outperform non-robust systems, enhancing the accuracy and reliability of AI applications.

Our research team is dedicated to tackling the challenge of commonsense reasoning in natural language processing (NLP). Despite significant advances in language models (LMs), they still exhibit subpar performance in this area. Therefore, we have focused on developing AI models and tasks specifically tailored to contextual commonsense reasoning. Our approach involves incorporating commonsense knowledge into deep learning models to improve their performance on various downstream tasks, such as emotion recognition, dialogue understanding, and sentence order prediction. Additionally, we introduced a novel and challenging commonsense reasoning task that requires AI models to answer causal questions by leveraging in-context speculation and creative thinking. Our ultimate goal is to develop robust and effective AI models that can accurately reason about the world and perform a wide range of NLP tasks with a high degree of accuracy.

Our research team has made significant contributions to the field of dialogue understanding over the past few years. Our focus has been on extracting implicit knowledge triplets from dialogues and emotion recognition in conversations, which are essential for enterprises that rely on chatbots for customer interactions. Our open-source dialogue context modeling algorithms, developed using advanced techniques such as transformers and graph neural networks (GNNs), have been widely adopted in this research area. Additionally, we have created large-scale datasets, which have helped to establish this research direction as a key subfield of dialogue system research.

Currently, we are addressing two pressing issues in the field of AI: 1) the time-consuming process of annotating data for supervised learning and 2) the environmental impact of running large AI models, which have high carbon emissions from GPU utilization. To combat these challenges, we have been actively exploring resource- and parameter-efficient techniques. Our efforts have resulted in the development of innovative approaches, including the use of contextual prompts for language understanding, language model prompting for dataset augmentation for zero-shot NLP, and the deployment of adapters to improve the performance of language models in various benchmarks. We have also successfully developed parameter-efficient solutions for speech processing. These techniques can significantly reduce the amount of annotated data needed for supervised learning and reduce the carbon footprint of large AI models, making them more sustainable and cost-effective.

About the Author

Congratulations to Assistant Professor Soujanya Poria for being awarded “10 to watch in AI” in 2022

Soujanya Poria is an Assistant Professor at SUTD. He earned his Ph.D. in computer science from the University of Stirling, U.K., and was honored with the prestigious Nanyang Technological University Presidential Postdoctoral Fellowship in 2018. Before his fellowship, he served as a scientist at A*STAR and Temasek Laboratory, NTU. With more than 100 published papers and articles in leading conferences and journals such as ACL, EMNLP, AAAI, NAACL, ECCV, Neurocomputing, IEEE Transactions on Affective Computing, IEEE Computational Intelligence Magazine, IEEE CIM, and Information Fusion, his cutting-edge research has been highly cited and received substantial funding from both the government and industry. His research has been recognized internationally, including with the IEEE CIM Outstanding Paper Award and ACM ICMI Best Paper Award Honorable Mention. He has held prominent roles at numerous conferences and workshops, including serving as area co-chair at ACL, NAACL, and EMNLP and as workshop chair at AACL 2022. He had given invited talks at events such as CICLing 2018, SocialNLP 2019, MICAI 2020, and ICON 2020. Currently, he is serving as an associate editor for Cognitive Computation, Information Fusion, and Neurocomputing. Dr. Poria’s research works are highly cited with more than 21,000 Google Scholar citations. He has been recognized with several career awards such as IEEE IS 10 to Watch in AI in 2022, and Aminer’s AI2000 highly influential honorable mention award in 2022.

Assistant Professor Soujanya Poria has been recognized as one of the “10 to watch in AI” awardees in 2022 ( This biannual award, given by IEEE to young researchers working in AI, is highly prestigious and comes after an extremely competitive selection process. The award is also featured in a post on the IEEE Computer Society Tech News blog: Assistant Professor Soujanya Poria heads the DeCLaRe research group ( which is affiliated with SUTD’s AI Mega Centre. He also serves as the Deputy Sector Lead (Research) for the SUTD AI Sector (